Update lite page

swe-bench · Mar 15, 2024 · fbe0695 · fbe0695
1 parent ea8f4d8
commit fbe0695
Show file tree

Hide file tree

Showing 2 changed files with 15 additions and 2 deletions.
diff --git a/img/swebench-lite-pie.png b/img/swebench-lite-pie.png
diff --git a/lite.html b/lite.html
@@ -47,7 +47,7 @@
           style="flex-direction: column"
         >
           <h1 style="font-size: 60px; padding-top: 0.4em">SWE-bench Lite</h1>
-          <h3>A Curated Subset for Efficient Evaluation of Language Models as Software Engineers</h3>
+          <h3>A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers</h3>
           <p style="text-align: center;margin-top:1em;">
             Carlos Jimenez, John Yang, Jiayi Geng<br />
             March 15, 2024
@@ -56,9 +56,22 @@ <h3>A Curated Subset for Efficient Evaluation of Language Models as Software Eng
       </section>
       <section class="main-container">
         <div class="content-wrapper">
+          <div class="content-box">
           <p class="text-content">
-
+              SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.
+              <br/>
+              <br/>
+              Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.
+              <br/>
+              <br/>
+              To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench lite. SWE-bench lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench lite in lieu of the full SWE-bench set if necessary. You can find the source code for how SWE-bench lite was created in <a href="https://github.com/princeton-nlp/SWE-bench/tree/main/collect/make_lite">SWE-bench/collect/make_lite</a>.
           </p>
+          <br/>
+          <img src="img/swebench-lite-pie.png" style="width: 50%; max-width: 400px; margin: auto; display: block;"/>
+          <p class="text-content" style="width: 50%; margin: auto; text-align: center;">
+            SWE-bench lite distribution across repositories. Compare to the full SWE-bench in Figure 3 of the <a href="https://arxiv.org/abs/2310.06770">SWE-bench paper</a>.
+          </p>
+        </div>
         </div>
       </section>
     </div>