Skip to content

Commit

Permalink
Update lite page
Browse files Browse the repository at this point in the history
  • Loading branch information
carlosejimenez committed Mar 15, 2024
1 parent ea8f4d8 commit fbe0695
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
Binary file added img/swebench-lite-pie.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 15 additions & 2 deletions lite.html
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
style="flex-direction: column"
>
<h1 style="font-size: 60px; padding-top: 0.4em">SWE-bench Lite</h1>
<h3>A Curated Subset for Efficient Evaluation of Language Models as Software Engineers</h3>
<h3>A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers</h3>
<p style="text-align: center;margin-top:1em;">
Carlos Jimenez, John Yang, Jiayi Geng<br />
March 15, 2024
Expand All @@ -56,9 +56,22 @@ <h3>A Curated Subset for Efficient Evaluation of Language Models as Software Eng
</section>
<section class="main-container">
<div class="content-wrapper">
<div class="content-box">
<p class="text-content">

SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.
<br/>
<br/>
Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.
<br/>
<br/>
To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench lite. SWE-bench lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench lite in lieu of the full SWE-bench set if necessary. You can find the source code for how SWE-bench lite was created in <a href="https://github.com/princeton-nlp/SWE-bench/tree/main/collect/make_lite">SWE-bench/collect/make_lite</a>.
</p>
<br/>
<img src="img/swebench-lite-pie.png" style="width: 50%; max-width: 400px; margin: auto; display: block;"/>
<p class="text-content" style="width: 50%; margin: auto; text-align: center;">
SWE-bench lite distribution across repositories. Compare to the full SWE-bench in Figure 3 of the <a href="https://arxiv.org/abs/2310.06770">SWE-bench paper</a>.
</p>
</div>
</div>
</section>
</div>
Expand Down

0 comments on commit fbe0695

Please sign in to comment.