Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AssistantAgent Tool Call Behavior #4602

Merged
merged 24 commits into from
Dec 10, 2024
Merged

Fix AssistantAgent Tool Call Behavior #4602

merged 24 commits into from
Dec 10, 2024

Conversation

husseinmozannar
Copy link
Contributor

@husseinmozannar husseinmozannar commented Dec 7, 2024

Resolves #4514

  • Limit 1 tool call per each on_messages invocation, by default return the tool call result as response.
  • Introduce reflect_on_tool_use to optionally reformat the tool call result using a model inference.
  • Beef up documentation.

Copy link
Collaborator

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it warrants unit tests for this.

We can split the disable of parallel tool calling for handoff in a separate PR if the requirements for that is still under-specified.

@husseinmozannar husseinmozannar requested a review from ekzhu December 8, 2024 01:49
@husseinmozannar
Copy link
Contributor Author

I added unit tests.

I think now this PR is just for fixing the repeated tool calls, I will let the handoffs for another PR. I think this is crucial to merge first before the other stuff.

…/_assistant_agent.py

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
python/uv.lock Outdated Show resolved Hide resolved
@husseinmozannar
Copy link
Contributor Author

PR on hold.

Ideally AssistantAgent only does 1 LLM call with tools (if any). Return type of message should depend on tool call and allow other agents to easily convert that message to a string. Other agents and teams should be updated after AssistantAgent is updated

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 8, 2024

I have updated the code, so we limit just 1 tool call iteration always. I have updated the examples too. @husseinmozannar @victordibia please verify the behavior is working with other scenarios.

@husseinmozannar
Copy link
Contributor Author

Behavior looks fine in a few sample teams I tried (multiple AssistantAgents in a round robin with varying access to tools), and tried the video surfer.

@ekzhu ekzhu merged commit 871a83f into main Dec 10, 2024
45 checks passed
@ekzhu ekzhu deleted the assistant_Agent_tools branch December 10, 2024 03:03
ekzhu pushed a commit that referenced this pull request Dec 10, 2024
* 1 tool call iteration default

* handoff first

* return_only_response

* add and remove tools

* print out tool calls

* pass checks

* fix issues

* add test

* add unit tests

* remove extra print

* Update python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>

* documentation and none max_tools_calls

* Always limit # tool call to 1

* Update notebooks for the changing behavior of assistant agent.

* Merge branch 'main' into assistant_Agent_tools

* add reflect_on_tool_use parameter to format the tool call result

* wip

* wip

* fix pyright

* Add unit tests

* Merge remote-tracking branch 'origin/main' into assistant_Agent_tools

* Update with custom formatting of tool call summary

* format

* Merge branch 'main' into assistant_Agent_tools
@jspv
Copy link
Contributor

jspv commented Dec 18, 2024

Hi, @husseinmozannar, I'm trying to understand the full intent with this change as all tools results are now returned in TextMessage (vs. only in ToolCallResultMessage) which makes it difficult to differentiate tool call results from LLM responses.

  • "Limit 1 tool call per each on_messages invocation, by default return the tool call result as response."

As per "by default", was this intended to be optional behavior? It was good to be able to easily differentiate tool call responses to the client from the client's response to the caller. Thanks.

@husseinmozannar
Copy link
Contributor Author

Hey!

There was a problem with the previous version of AssistantAgent.

GPT-4 when it decides to call a tool, only returns the tool call and no other response.

In previous version of AssistantAgent, it called the LLM as many as time as needed so that the final LLM response was not a tool call i.e. a string.
Now we fix that issue as follows:

Referencing the API doc

  • If the model returns no tool call, then the response is immediately returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message.

  • When the model returns tool calls, they will be executed right away:

  1. When reflect_on_tool_use is False (default), the tool call results are returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message. tool_call_summary_format can be used to customize the tool call summary.
    We still yield the toolcall and toolcallresult prior to the final textmessage

  2. When reflect_on_tool_use is True, the another model inference is made using the tool calls and results, and the text response is returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message.
    We still yield the toolcall and toolcallresult prior to the final textmessage

Moreover the innermessages of the final textmessage will have the toolcall +results.

Does this make more sense now?

@jspv
Copy link
Contributor

jspv commented Dec 18, 2024

Thanks, I understand the intent, the challenge for me is that both tool call results and normal responses are now returned as :class:~autogen_agentchat.messages.TextMessage with no differentiating characteristics, making it difficult to determine the difference between a textmessage from the agent and the additional tool call response (or summary) from the tool without tracking additional state (e.g. did I just receive a toolcallresult prior to this message, if so, then the next TextMessage is the tool response, not from the LLM).

I think It would be better to not mix the types. There is already toolcallresultmessage which was was previously the unambiguous way to identify tool results; now tool results are coming in two forms (toolcallresultmessage and another copy in a different format as textmessage) and it is now requires extra logic to try and determine if the textmessage is the tool results or from the LLM.

Can we create a clear message types for the new messages (e.g. toolcallresultsummarymessage) or some other disambiguating way in the message the determine the type of message?

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 18, 2024

@husseinmozannar I think it's a valid point and we should create a new type of chat message for this.

It can be important for orchestration or termination condition. When inner messages are not emitted, we need to rely on typing to figure out what happened.

@jspv
Copy link
Contributor

jspv commented Dec 18, 2024

@husseinmozannar, I'm think I'm finding more side effects of this change. Using a simple RoundRobin with two agents:

  • writer, writes papers and has access to a web search tool
  • editor, reviews paper and provides suggestions and will terminate the team when paper is approved.

With the tool results now being returned as a TextMessage, I'm seeing the speaker move from writer->editor immediately after receiving the TextMessage tool response; so rather than the writer receiving the tool reply and using it to write the paper, the editor takes over prematurely.

Before:

task->writer runs tool -> writer writes paper -> editor provides feedback -> writer <-> editor ... -> editor approves

Now:
task-> writer runs tool -> editor provides feedback on tool results -> writer <-> editor ...

  • the writer never gets to act on the tool results.

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 18, 2024

We observed this effect as well. One way to fix this is by setting the reflect_on_tool_use=True when you create the writer

Alternatively, set allow_repeated_speaker=True in selector group chat. What you saw is because the selector by default moves on from the same speaker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update AssistantAgent tool call behavior
3 participants