Skip to content

Commit

Permalink
Add news section
Browse files Browse the repository at this point in the history
  • Loading branch information
john-b-yang committed Aug 13, 2024
1 parent f131001 commit 85dc79f
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 10 deletions.
25 changes: 20 additions & 5 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,25 @@ <h3 style="font-size: 20px; padding-top: 1.2em">ICLR 2024</h3>
</section>
<section class="main-container">
<div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
<div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 80%;">
🔥 Evaluating on SWE-bench just became a lot more reliable!
SWE-bench evaluation now uses <b>Docker</b> for easier, containerized, reproducible evaluation.
[<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
<div class="content-box">
<h2 class="text-title">News</h2>
<p style="margin-bottom: 0.5em">
📣 [08/2024] SWE-bench x OpenAI = <b>SWE-bench Verified</b>, a human-validated subset of
500 problems reviewed by software engineers!
[<a style="color:#0ca7ff" href="https://openai.com/index/introducing-swe-bench-verified/">Report</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [06/2024] We've <b>Docker</b>-ized SWE-bench for easier, containerized, reproducible evaluation.
[<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [03/2024] Check out our latest work, <b>SWE-agent</b>, which achieves a 12.47% resolve rate on SWE-bench!
[<a href="https://swe-agent.com/" class="light-blue-link" target="_blank" rel="noopener noreferrer">Link</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [03/2024] We've released <b>SWE-bench Lite</b>! Running all of SWE-bench can take time. This subset makes it easier!
[<a style="color:#0ca7ff" href="lite.html">Report</a>]
</p>
</div>
</div>
<div class="content-wrapper">
Expand Down Expand Up @@ -1984,7 +1999,7 @@ <h2 class="text-title">Leaderboard</h2>
[<a href="lite.html">Post</a>].
<br>
SWE-bench <b>Verified</b> is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate
[<a href="">Post</a>].
[<a href="https://openai.com/index/introducing-swe-bench-verified/">Post</a>].
<br><br>
- The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances
(<b>2294</b> for test, <b>500</b> for verified, <b>300</b> for lite)
Expand Down
25 changes: 20 additions & 5 deletions template/template_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,25 @@ <h3 style="font-size: 20px; padding-top: 1.2em">ICLR 2024</h3>
</section>
<section class="main-container">
<div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
<div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 80%;">
🔥 Evaluating on SWE-bench just became a lot more reliable!
SWE-bench evaluation now uses <b>Docker</b> for easier, containerized, reproducible evaluation.
[<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
<div class="content-box">
<h2 class="text-title">News</h2>
<p style="margin-bottom: 0.5em">
📣 [08/2024] SWE-bench x OpenAI = <b>SWE-bench Verified</b>, a human-validated subset of
500 problems reviewed by software engineers!
[<a style="color:#0ca7ff" href="https://openai.com/index/introducing-swe-bench-verified/">Report</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [06/2024] We've <b>Docker</b>-ized SWE-bench for easier, containerized, reproducible evaluation.
[<a style="color:#0ca7ff" href="https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker">Report</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [03/2024] Check out our latest work, <b>SWE-agent</b>, which achieves a 12.47% resolve rate on SWE-bench!
[<a href="https://swe-agent.com/" class="light-blue-link" target="_blank" rel="noopener noreferrer">Link</a>]
</p>
<p style="margin-bottom: 0.5em">
📣 [03/2024] We've released <b>SWE-bench Lite</b>! Running all of SWE-bench can take time. This subset makes it easier!
[<a style="color:#0ca7ff" href="lite.html">Report</a>]
</p>
</div>
</div>
<div class="content-wrapper">
Expand Down Expand Up @@ -168,7 +183,7 @@ <h2 class="text-title">Leaderboard</h2>
[<a href="lite.html">Post</a>].
<br>
SWE-bench <b>Verified</b> is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate
[<a href="">Post</a>].
[<a href="https://openai.com/index/introducing-swe-bench-verified/">Post</a>].
<br><br>
- The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances
(<b>2294</b> for test, <b>500</b> for verified, <b>300</b> for lite)
Expand Down

0 comments on commit 85dc79f

Please sign in to comment.