Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydantic conversion logic for structured outputs is broken for models containing dictionaries #2004

Open
1 task done
dbczumar opened this issue Jan 10, 2025 · 4 comments · May be fixed by #2003
Open
1 task done

Pydantic conversion logic for structured outputs is broken for models containing dictionaries #2004

dbczumar opened this issue Jan 10, 2025 · 4 comments · May be fixed by #2003
Labels
bug Something isn't working

Comments

@dbczumar
Copy link

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

There's a bug in OpenAI's python client logic for translating pydantic models with dictionaries into structured outputs JSON schema definitions: dictionaries are always required to be empty in the resulting JSON schema, rendering the dictionary outputs significantly less useful since the LLM is never allowed to populate them

I've filed a small PR to fix this and introduce test coverage: #2003

To Reproduce

import json
from typing import Any, Dict

import pydantic

from openai.lib._pydantic import to_strict_json_schema

class GenerateToolCallArguments(pydantic.BaseModel):
    arguments: Dict[str, Any] = pydantic.Field(description="The arguments to pass to the tool")

print(json.dumps(to_strict_json_schema(GenerateToolCallArguments), indent=4))

Observe that the output inserts additionalProperties: False into the resulting JSON schema definition, meaning that the dictionary must always be empty:

{
    "properties": {
        "arguments": {
            "description": "The arguments to pass to the tool",
            "title": "Arguments",
            "type": "object",
            # THE INSERTION OF THIS LINE IS A BUG
            "additionalProperties": false
        }
    },
    "required": [
        "arguments"
    ],
    "title": "GenerateToolCallArguments",
    "type": "object",
    "additionalProperties": false
}

Code snippets

No response

OS

macOS

Python version

Python v3.10.12

Library version

1.59.6

@dbczumar
Copy link
Author

Tagging @RobertCraigie for visibility, just in case (saw that you've been active on recent issues) :)

@BrunoScaglione
Copy link

BrunoScaglione commented Jan 11, 2025

I'm having the same issue, can confirm that models with dictionaries is the root problem. But i checked the documentation again, and they do talk about only allowing additionalProperties=false.

@dbczumar
Copy link
Author

@RobertCraigie Any updates or additional thoughts here?

@dvschuyl
Copy link

I have also encountered the same issue. After some tinkering, I found some more types that resulted in errors. The only buildin collection type that doesn't seem to be affected is the list.

My code (python 3.13.1):

import json
from pydantic import BaseModel
from openai.lib._pydantic import to_strict_json_schema
from openai import OpenAI


class Schema(BaseModel):
    # Python buildin collections
    # `range` and `bytearray` are not supported types, so I didn't include them

    # tuple_field: tuple[int, int, int]
    list_field: list[int]
    # dict_field: dict[int, int]
    # set_field: set[int]
    # frozenset_field: frozenset[int]
    # bytes_field: bytes


print(json.dumps(to_strict_json_schema(Schema), indent=4))

api_key = ...
with OpenAI(api_key=api_key) as client:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Fill the schema with random values"}],
        response_format=Schema,
    )

print(json.dumps(response.choices[0].message.model_dump()["content"], indent=4))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants