Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Dec 20, 2024
1 parent 907febf commit 291033f
Show file tree
Hide file tree
Showing 2 changed files with 127 additions and 19 deletions.
40 changes: 21 additions & 19 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated December 19, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated December 20, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -108,7 +108,7 @@ <h2>Counting</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>14.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.008</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>There are 8 fruits in the image.</pre>
<pre>9</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -162,7 +162,7 @@ <h2>Document OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.5, 'y': 0.35, 'width': 0.25, 'height': 0.5}</pre>
<pre>{'x': 0.5, 'y': 0.4, 'width': 0.3, 'height': 0.25}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -270,7 +270,7 @@ <h2>Graph Understanding</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.012</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -343,7 +343,7 @@ <h2>Color Recognition</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -359,9 +359,9 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"R": 89,
"R": 80,
"G": 0,
"B": 180
"B": 130
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -403,7 +403,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -417,11 +417,13 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the image, it seems all visible cars are labeled with red bounding boxes except a small car partially visible at the far right side of the road where the white car is located. This indicates 1 missing annotation.
<pre>Upon examining the image, all visible vehicles appear to have bounding boxes labeled around them. No vehicles seem obviously unlabeled or missing annotations in this instance.

Here is the JSON response:

```json
{
"missing": 1
"missing": 0
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -463,7 +465,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -477,7 +479,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the image provided, the sticker appears to be square with sides measuring approximately 3 inches.
<pre>Based on the ruler in the image, the square sticker appears to be approximately 3 inches in both length and width. Here's the JSON:

```json
{
Expand Down Expand Up @@ -533,7 +535,7 @@ <h2>Zero Shot Classification</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.005</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.006</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -587,7 +589,7 @@ <h2>Handwriting OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -641,7 +643,7 @@ <h2>Structured Data OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.007</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -749,7 +751,7 @@ <h2>Easy Captcha</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.005</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.006</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -803,7 +805,7 @@ <h2>Easy Captcha with Persuasion Attack</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.005</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.007</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down
106 changes: 106 additions & 0 deletions results/2024-12-20.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.006400000000000001,
"pass_fail": "Pass",
"response_time": 1.3533625602722168,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.00882,
"pass_fail": "Fail",
"response_time": 2.158290386199951,
"result": "9"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.00988,
"pass_fail": "Fail",
"response_time": 2.6740176677703857,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00974,
"pass_fail": "Pass",
"response_time": 7.372980833053589,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 3.0130202770233154,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015070000000000002,
"pass_fail": "Pass",
"response_time": 3.36564040184021,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.5334281650071124,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 2.2458724975585938,
"result": "{'x': 0.5, 'y': 0.4, 'width': 0.3, 'height': 0.25}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01174,
"pass_fail": "Fail",
"response_time": 3.5169425010681152,
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9594771241830066,
"success": false,
"price": 0.009850000000000001,
"pass_fail": "Fail",
"response_time": 4.105275392532349,
"result": "```json\n{\n \"R\": 80,\n \"G\": 0,\n \"B\": 130\n}\n```"
},
"annotation_qa": {
"score": 0.0,
"success": false,
"price": 0.01616,
"pass_fail": "Fail",
"response_time": 3.2635600566864014,
"result": "Upon examining the image, all visible vehicles appear to have bounding boxes labeled around them. No vehicles seem obviously unlabeled or missing annotations in this instance.\n\nHere is the JSON response:\n\n```json\n{\n \"missing\": 0\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.01056,
"pass_fail": "Fail",
"response_time": 3.794170379638672,
"result": "Based on the ruler in the image, the square sticker appears to be approximately 3 inches in both length and width. Here's the JSON:\n\n```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.00636,
"pass_fail": "Pass",
"response_time": 1.3472952842712402,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.006860000000000001,
"pass_fail": "Pass",
"response_time": 1.4040160179138184,
"result": "charybdis indubitable"
}
}

0 comments on commit 291033f

Please sign in to comment.