Upgrade Opengpts #361

lgesuellip · 2024-10-24T20:44:46Z

Hi Team,

As a member of the Pampa Team, I’ve been working on this PR to upgrade OpenGPTs to the latest version of Langchain dependencies. This update ensures compatibility with Pydantic 2 and resolves issues with related packages.

Code changes

Migration to Pydantic 2
- Migrated the codebase to use Pydantic 2.
Langchain Dependency Upgrades
- Updated all langchain dependencies to their latest versions (like langchain, langchain-core, langgraph, etc).
- Removed langchain-robocorp as it is currently incompatible with Pydantic 2.
- Updated the unstructured dependency to resolve issues related to nltk and its associated packages, such as punkt.
Code Adaptations
- Refactored the checkpoint logic to use the AsyncPostgresSaver from the langgraph implementation for improved compatibility and performance.
Updated schemas to work with Pydantic 2's BaseModel.
Fixed bugs using GPT-4o.

Testing

Updated test cases to ensure compatibility with the new codebase.
Refactored and adapted tests to validate changes in schemas, checkpoint handling, and dependencies.

Related issues

#352

Looking forward to your feedback
Thank you Team!

backend/app/tools.py

lgesuellip · 2024-12-05T18:18:41Z

Hey team @eyurtsev @nfcampos , I just pushed all the changes.
I look forward to your review and feedback.
Thank you!

eyurtsev

Seeing two issues:

(MAJOR) w/ migration
(minor) not 100% sure but UI doesn't seem to load the full screen for creating a new bot --- requiring clicking through on a tab. This could be something associated w/ data returned to the UI through one of the endpoints

backend/app/tools.py

eyurtsev · 2024-12-18T16:54:44Z

backend/app/agent.py

@@ -265,7 +264,7 @@ class ConfigurableRetrieval(RunnableBinding):
    llm_type: LLMType
    system_message: str = DEFAULT_SYSTEM_MESSAGE
    assistant_id: Optional[str] = None
-    thread_id: Optional[str] = None
+    thread_id: Optional[str] = ""


Why is this not a None default?

I spent a lot of time debugging this part. The error indicates a conflict in the configuration specifications for thread_id.

The error occurs during validation in the following code:

@router.get("/config_schema") async def config_schema() -> dict: """Return the config schema of the runnable.""" return agent.config_schema().model_json_schema()

The issue seems to arise because there are two conflicting ConfigurableFieldSpec definitions for thread_id:
1. Definition 1: ConfigurableFieldSpec with annotation=typing.Optional[str] and default=None.
2. Definition 2: ConfigurableFieldSpec with annotation=<class 'str'> and default=''.

So, I decided to set the default to '', and it works. However, I would prefer to keep it as None. Do you know what might be causing the problem? The assistant_id is similar, but I don’t encounter this issue with it.

eyurtsev · 2024-12-18T16:55:43Z

backend/app/agent.py

@@ -135,7 +132,7 @@ class ConfigurableAgent(RunnableBinding):
    retrieval_description: str = RETRIEVAL_DESCRIPTION
    interrupt_before_action: bool = False
    assistant_id: Optional[str] = None
-    thread_id: Optional[str] = None
+    thread_id: Optional[str] = ""


Why is this not a None?

eyurtsev · 2024-12-18T21:19:46Z

backend/migrations/000002_checkpoints_update_schema.up.sql

-ALTER TABLE checkpoints
-    ADD COLUMN IF NOT EXISTS thread_ts TIMESTAMPTZ,
-    ADD COLUMN IF NOT EXISTS parent_ts TIMESTAMPTZ;
+-- Drop existing checkpoints-related tables if they exist


I believe this migration fails for anyone that's run migrations 1 through 4 already. the migration state is kept of in the database.

Should this run as step 5 so it'll run at the end?

So if you try to run opengpts with the previous version, and then apply PR on top and run things -- the migrations will not work.

W/ current approach it seems like any old threads are no longer usable from the app. (I'm assuming not super easy to recover b/c of the pickle serde that was used).

You’re right! I think the same thing happens in LangGraph when people decide to use the new checkpointer, right?

The new checkpointers can be versioned as far as I understand

https://github.com/langchain-ai/langgraph/blob/main/libs/checkpoint-postgres/langgraph/checkpoint/postgres/base.py#L27

So at least going forward there's a way to carry out schema migrations automatically.

But yeah going from the pickle checkpointer -> new checkpointer was a breaking change. I'm OK if we don't worry about this, don't think this affects that many users.

I'd just prefer if we didn't wipe out any potential sql tables that users may want to recover data from

Hey Eugene,
I've implemented the changes we discussed previously:

Migration Changes

Up:

Updated migration 5 to rename the existing checkpoints table (preserving old data) instead of modifying it.

The checkpoint_blobs and checkpoint_writes tables will now be created at runtime.

Down:

The migration properly reverts these changes by restoring the original table name.
Note: This is a breaking change – old checkpoints won’t be accessible in the new system.

Runtime Setup

Added an ensure_setup() call in the lifespan event, which will call the async_postgres_saver.setup() method.

This ensures that the new checkpoint tables (checkpoints, writes, and blobs) are properly created during application startup.

I’ve tested the migration path from the old version to the new one, and it works as expected. The old data is preserved in the renamed table, while the new checkpoint system operates seamlessly with its updated table structure.

backend/migrations/000002_checkpoints_update_schema.up.sql

backend/app/schema.py

eyurtsev · 2024-12-18T22:18:50Z

backend/app/checkpoint.py

-        if isinstance(value, list) and all(isinstance(v, BaseMessage) for v in value):
-            loaded["channel_values"][key] = [v.__class__(**v.__dict__) for v in value]
-    return loaded
+class AsyncPostgresCheckpoint(BasePostgresSaver):


Have you considered initializing on app start up and calling .setup() to set up the migration, and then avoiding doing the wrapping of the checkpointer?

It'll help keep the checkpoints in sync and remove some extra code here

Hey Eugene,

Checkpoint:

I’ve run into a few challenges while implementing langgraph’s checkpoint:

Global Checkpointer Initialization:
I defined the checkpointer in the lifespan and declare it as a global in agent.py. However, I encountered the error “Checkpointer not initialized” because the global instance wasn’t properly initialized before being accessed.
This issue occurs because the checkpointer depends on the application startup completing before it can be used.

Singleton Pattern Issues:
I tried using a singleton pattern for AsyncPostgresSaver to manage the global instance, initializing it during the lifespan. However, the initialization of AsyncPostgresSaver requires an async event loop, which isn’t always available—such as during testing—resulting in the error: “no running event loop.”

Current Implementation:
I implemented a solution inspired by the current approach in OpenGPTs, adapted to use the new checkpointer in LangGraph:

Singleton with Lazy Initialization: Created a BasePostgresSaver class with a singleton pattern that assigns the instance before initialization.

Async Setup Method: Moved the connection pool creation to an async setup() method, ensuring it initializes during the lifespan of the application when an asynchronous loop is available.

I’m open to trying a different approach if you have any suggestions or recommendations!

Migration:

Finally, I decided to run the migrations before using the app to stay consistent with OpenGPTs and ensure the queries are ready. I am using the same .sql.

What do you think?

backend/app/checkpoint.py

lgesuellip · 2025-01-26T22:02:17Z

Fix state handling inconsistency between different agent types:

The issue:

tool_agent and chatbot use a simple list of messages as their state
retrieval agent uses a dict with both messages and additional metadata (msg_count)
This was causing conflicts when updating state between different agent types

Changes:

Adapted update_thread_state to detect agent type from current state:
- For retrieval agent: preserve additional state data while updating messages
- For other agents: maintain simple message list state

lgesuellip · 2025-01-26T22:19:13Z

@eyurtsev I've added the changes we discussed.
I'd love your feedback.

eyurtsev

Great! @lgesuellip changes look good

eyurtsev · 2025-01-29T21:12:44Z

probably need to bump poetry version used on CI. i'm taking a look

backend/pyproject.toml

eyurtsev · 2025-01-29T21:27:14Z

Removed the package-mode instead -- there are some other breaking changes in poetry 2.0, so just using 1.5.1 instead

lgesuellip added 2 commits October 24, 2024 17:40

Migrate pydantic

cbbda36

Upgrade poetry

594d9cd

eyurtsev reviewed Oct 24, 2024

View reviewed changes

backend/app/tools.py Outdated Show resolved Hide resolved

lgesuellip added 10 commits November 13, 2024 23:23

Adapt to manage checkpoint using an AstncSaver

9e3bbd4

Adjust Tools model

aacf3db

Add checkpoint

5f8f2e1

Update poetry

4f151db

Format

7483722

Fix tests

23b0f2a

Modify tables

27f1df2

Fix gpt4o

538f46f

Fix bots

34bc8a1

Fix retrieval

024ae14

eyurtsev reviewed Dec 18, 2024

View reviewed changes

kabylkassymov reviewed Dec 24, 2024

View reviewed changes

backend/app/checkpoint.py Outdated Show resolved Hide resolved

“lgesuellip” added 2 commits January 25, 2025 20:37

Adding eugenes suggestions

5ca17e2

Fix state handling inconsistency between different agent types

495ee01

Improve doc

5118662

eyurtsev approved these changes Jan 29, 2025

View reviewed changes

eyurtsev reviewed Jan 29, 2025

View reviewed changes

backend/pyproject.toml Outdated Show resolved Hide resolved

Update backend/pyproject.toml

a310032

eyurtsev added 2 commits January 29, 2025 16:28

lint fix

005dde7

lint

22721fa

eyurtsev merged commit 2cf3bf7 into langchain-ai:main Jan 29, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Opengpts #361

Upgrade Opengpts #361

lgesuellip commented Oct 24, 2024 •

edited

Loading

lgesuellip commented Dec 5, 2024

eyurtsev left a comment

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024 •

edited

Loading

eyurtsev Dec 18, 2024

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024

eyurtsev Dec 19, 2024

lgesuellip Jan 26, 2025

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024

lgesuellip commented Jan 26, 2025

lgesuellip commented Jan 26, 2025

eyurtsev left a comment

eyurtsev commented Jan 29, 2025

eyurtsev commented Jan 29, 2025

Upgrade Opengpts #361

Upgrade Opengpts #361

Conversation

lgesuellip commented Oct 24, 2024 • edited Loading

Code changes

Testing

Related issues

lgesuellip commented Dec 5, 2024

eyurtsev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgesuellip Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgesuellip commented Jan 26, 2025

lgesuellip commented Jan 26, 2025

eyurtsev left a comment

Choose a reason for hiding this comment

eyurtsev commented Jan 29, 2025

eyurtsev commented Jan 29, 2025

lgesuellip commented Oct 24, 2024 •

edited

Loading

lgesuellip Dec 19, 2024 •

edited

Loading