Add news section

swe-bench · Aug 13, 2024 · 85dc79f · 85dc79f
1 parent f131001
commit 85dc79f
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 10 deletions.
diff --git a/index.html b/index.html
@@ -93,10 +93,25 @@ <h3 style="font-size: 20px; padding-top: 1.2em">ICLR 2024</h3>
       </section>
       <section class="main-container">
         <div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
-          <div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 80%;">
-            🔥 Evaluating on SWE-bench just became a lot more reliable!
-            SWE-bench evaluation now uses <b>Docker</b> for easier, containerized, reproducible evaluation.
-            [<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
+          <div class="content-box">
+            <h2 class="text-title">News</h2>
+            <p style="margin-bottom: 0.5em">
+              📣 [08/2024] SWE-bench x OpenAI = <b>SWE-bench Verified</b>, a human-validated subset of 
+              500 problems reviewed by software engineers!
+              [<a style="color:#0ca7ff" href="https://openai.com/index/introducing-swe-bench-verified/">Report</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [06/2024] We've <b>Docker</b>-ized SWE-bench for easier, containerized, reproducible evaluation.
+              [<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [03/2024] Check out our latest work, <b>SWE-agent</b>, which achieves a 12.47% resolve rate on SWE-bench!
+              [<a href="https://swe-agent.com/" class="light-blue-link" target="_blank" rel="noopener noreferrer">Link</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [03/2024] We've released <b>SWE-bench Lite</b>! Running all of SWE-bench can take time. This subset makes it easier!
+              [<a style="color:#0ca7ff" href="lite.html">Report</a>]
+            </p>
           </div>
         </div>
         <div class="content-wrapper">
@@ -1984,7 +1999,7 @@ <h2 class="text-title">Leaderboard</h2>
               [<a href="lite.html">Post</a>].
               <br>
               SWE-bench <b>Verified</b> is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate
-              [<a href="">Post</a>].
+              [<a href="https://openai.com/index/introducing-swe-bench-verified/">Post</a>].
               <br><br>
               - The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances
               (<b>2294</b> for test, <b>500</b> for verified, <b>300</b> for lite)

diff --git a/template/template_index.html b/template/template_index.html
@@ -93,10 +93,25 @@ <h3 style="font-size: 20px; padding-top: 1.2em">ICLR 2024</h3>
       </section>
       <section class="main-container">
         <div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
-          <div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 80%;">
-            🔥 Evaluating on SWE-bench just became a lot more reliable!
-            SWE-bench evaluation now uses <b>Docker</b> for easier, containerized, reproducible evaluation.
-            [<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
+          <div class="content-box">
+            <h2 class="text-title">News</h2>
+            <p style="margin-bottom: 0.5em">
+              📣 [08/2024] SWE-bench x OpenAI = <b>SWE-bench Verified</b>, a human-validated subset of 
+              500 problems reviewed by software engineers!
+              [<a style="color:#0ca7ff" href="https://openai.com/index/introducing-swe-bench-verified/">Report</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [06/2024] We've <b>Docker</b>-ized SWE-bench for easier, containerized, reproducible evaluation.
+              [<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [03/2024] Check out our latest work, <b>SWE-agent</b>, which achieves a 12.47% resolve rate on SWE-bench!
+              [<a href="https://swe-agent.com/" class="light-blue-link" target="_blank" rel="noopener noreferrer">Link</a>]
+            </p>
+            <p style="margin-bottom: 0.5em">
+              📣 [03/2024] We've released <b>SWE-bench Lite</b>! Running all of SWE-bench can take time. This subset makes it easier!
+              [<a style="color:#0ca7ff" href="lite.html">Report</a>]
+            </p>
           </div>
         </div>
         <div class="content-wrapper">
@@ -168,7 +183,7 @@ <h2 class="text-title">Leaderboard</h2>
               [<a href="lite.html">Post</a>].
               <br>
               SWE-bench <b>Verified</b> is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate
-              [<a href="">Post</a>].
+              [<a href="https://openai.com/index/introducing-swe-bench-verified/">Post</a>].
               <br><br>
               - The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances
               (<b>2294</b> for test, <b>500</b> for verified, <b>300</b> for lite)