中國科大認知全重實驗室發布Agent-R1訓練框架,支持自主思考與工具調用!
什么是 Agent-R1
Agent-R1是由中科大認知智能全國重點實驗室開發的智能體強化學習訓練框架,致力于推進強化學習與智能體技術的融合發展。框架采用端到端強化學習方法,突破了依賴人工設計工作流的傳統智能體開發瓶頸,讓AI直接從與環境的交互中學習最優策略,實現自主決策與行動。開發者只需定義特定領域的工具和獎勵函數,即可將Agent-R1擴展到各種應用場景,無需編寫復雜的工作流程。
背景
隨著大型語言模型(LLM)技術的快速發展,智能體(Agent)已成為人工智能領域的熱門話題。然而,當前市場上大多數所謂的"智能體"實際上只是工作流系統——通過預定義的代碼路徑來編排LLM和工具的使用,而非真正的自主智能。正如Anthropic最新定義的那樣,真正的LLM智能體應當是"系統中的LLM能夠動態指導自己的處理過程和工具使用,保持對任務完成方式的控制權"。
傳統工作流系統在實際應用中面臨著嚴重的局限性:
? 無法規劃,往往陷入死胡同
? 記憶能力有限,難以維持超過5-10分鐘的任務
? 長期行動效率低下,行動序列常因累積誤差而失敗
這些痛點揭示了AI發展的"苦澀教訓":通過硬編碼知識和規則來構建智能體系統只能帶來短期收益,長期來看必然會遇到瓶頸。而突破性進展最終將通過基于搜索和學習的計算擴展方法到來。
在這一背景下,OpenAI、Anthropic等前沿實驗室已開始研發基于強化學習訓練的真正智能體。OpenAI的DeepResearch和Claude Sonnet 3.7在搜索和編程任務中展現出了令人印象深刻的能力——它們能夠規劃搜索策略,交叉引用多個來源,并基于中間反饋完成復雜的多步驟任務。
這一技術突破表明,通過將強化學習與推理能力結合,LLM可以從與環境的直接交互中學習最優策略,無需依賴人工設計的提示詞和工作流。這些智能體不僅能夠思考,還能夠自主行動、規劃和調整策略,真正實現長期、復雜任務的自主完成。
Agent-R1正是在這一關鍵技術變革期應運而生的開源框架,致力于推進強化學習與智能體技術的融合發展,讓這一前沿技術能夠被更廣泛的研究者和開發者所使用,共同推動AI真正邁向自主智能的新時代。
核心特點
多輪工具調用能力
Agent-R1采用端到端強化學習方法,訓練智能體從完整交互軌跡中學習。這使模型能夠進行連續多輪工具調用,理解長期因果關系,將當前行動與未來結果關聯起來,形成真正的規劃和迭代能力。
多工具協調機制
框架支持智能體學習如何協調使用多種工具共同解決復雜任務。智能體能夠靈活選擇最適合當前情境的工具組合,形成有效的工具組合策略,而非被限制在單一工具的使用模式中。
過程獎勵系統
創新的過程獎勵機制允許對每個工具調用的有效性進行單獨評估,而非僅關注最終結果。通過獎勵歸一化技術,該方法平衡了過程獎勵與結果獎勵,確保智能體在追求最終目標的同時也注重解決問題的過程質量。
自定義工具和環境
框架與主流LLM工具調用格式完全兼容,開發者可以輕松擴展自己的工具和應用場景。只需定義特定領域的工具和獎勵函數,即可將Agent-R1應用到任何專業領域,無需復雜的工作流設計。
多種強化學習算法
Agent-R1 支持多種先進的強化學習算法,包括PPO、GRPO和REINFORCE++,為不同需求和場景提供靈活選擇。開發者可以根據具體應用選擇最適合的算法,優化智能體的學習效果。
多模態支持
最新版本實現了全面的多模態支持,無縫集成視覺-語言模型(VLMs),使智能體能夠處理和推理文本與視覺輸入的組合。這一功能極大擴展了智能體的應用場景,能夠在豐富的多模態環境中進行有效交互。
案例展示
在HotpotQA上的實驗驗證顯示,即使是1.5B的小模型,在強化學習后也能自主推理并調用搜索引擎,解決復雜問題。
User
Question: The album that contains the song "Get Together" by Madonna was mainly recorded in whose home studio?
Assistant
<think>
To answer this question, we need to find out the song "Get Together" by Madonna and then determine in which home studio she recorded the album it belongs to. I will use the search tool to find information about the song "Get Together" by Madonna.
</think>
<tool_call>
{"name": "search", "arguments": {"query": "Get Together by Madonna"}}
</tool_call>
Tool
<tool_response>
{"results": ["Get Together (Madonna song) \"Get Together\" is a song by American singer Madonna from her tenth studio album \"Confessions on a Dance Floor\" (2005). Produced by Madonna and Stuart Price, the song was released as the third single from the album by Warner Bros. Records on June 6, 2006. The decision was spurred by the fact that \"Get Together\" was the third most downloaded song from the album. It was also released to coincide with the start of Madonna's Confessions Tour. Inspired by Stardust's single \"Music Sounds Better with You\", \"Get Together\" portrays an anthem-like picture with its lyrics, about the different possibilities of finding love on the dance floor.", "Keep It Together (song) \"Keep It Together\" is a song by American singer Madonna from her fourth studio album \"Like a Prayer\" (1989). It was released on January 30, 1990 by Sire Records as the sixth and final single from the album in the United States, Canada and Japan. Written and produced by Madonna and Stephen Bray, the main inspiration behind \"Keep It Together\" was Madonna's relationship with her family\u2014whom she dearly missed after her divorce from actor Sean Penn. The song was dedicated to American funk and soul band Sly and the Family Stone. The lyrics deal with the realization of how important Madonna's family has been to her life. A pop and funk song consisting of an upbeat rhythm and groove, \"Keep It Together\" features percussion, banjo and a conga.", ...]}
</tool_response>
Assistant
<think>
Based on the search results, "Get Together" is indeed a song by Madonna from her album "Confessions on a Dance Floor" (2005). This provides the name of the album and its release details.
Next, I need to find out where this album was mainly recorded.
I will now use the search tool to find information about the album "Confessions on a Dance Floor" by Madonna, specifically looking for details on its recording process.
</think>
<tool_call>
{"name": "search", "arguments": {"query": "Confessions on a Dance Floor (album)"} }
</tool_call>
Tool
<tool_response>
{"results": ["Confessions on a Dance Floor Confessions on a Dance Floor is the tenth studio album by American singer and songwriter Madonna. It was released on November 9, 2005 by Warner Bros. Records. A complete departure from her previous studio album \"American Life\" (2003), the album includes influences of 1970s and 1980s disco, as well as modern-day club music. Initially, she began working with Mirwais Ahmadza\u00ef for the album, but later felt that their collaboration was not going in the direction she desired. Madonna took her collaboration with Stuart Price who was overviewing her documentary \"I'm Going to Tell You a Secret\". The album was mainly recorded at Price's home-studio where Madonna spent most of her time during the recordings.", "Confessions (Usher album) Confessions is the fourth studio album by American singer Usher. It was released on March 23, 2004, by Arista Records. Recording sessions for the album took place during 2003 to 2004, with its production handled by his longtime collaborator Jermaine Dupri, alongside with Jimmy Jam and Terry Lewis and Lil Jon, among others. Primarily an R&B album, \"Confessions\" showcases Usher as a crooner through a mixture of ballads and up-tempos, incorporating musical genres of dance-pop, hip hop and crunk. The album's themes generated controversy about Usher's personal relationships; however, the album's primary producer Jermaine Dupri claimed the album reflects his personal story.",...]}
</tool_response>
Assistant
<think>
From the search results, we learn that "Confessions on a Dance Floor" by Madonna was indeed released in 2005, but the primary information we seek is where this album was mainly recorded. The results indicate that it was "mainly recorded at Price's home-studio".
The album "Confessions on a Dance Floor" by Madonna was mainly recorded at Stuart Price's home studio.
Therefore, the answer to the question "The album that contains the song 'Get Together' by Madonna was mainly recorded in whose home studio?" is Stuart Price's home studio.
</think>
<answer>
Stuart Price's home studio
</answer>
訓練過程分析
以下三組圖片分別展示了使用三種主流的強化學習算法訓練時模型性能、工具調用輪數以及最終回答長度的變化過程。
PPO
REINFORCE++
GRPO
可以觀察到,模型 (Qwen2.5-1.5B-Instruct) 在面對具有挑戰性的多跳問題時,有效地學會了先思考再多輪調用工具,最終提高了精確匹配(EM)分數。雖然不同強化學習算法的效果各異,但總體趨勢保持一致。
值得注意的是,實驗揭示了一個顯著的相關性:模型性能、工具調用輪數和最終回答長度在整個訓練過程中呈現一致的變化趨勢。這展示了 Scaling Law 的全新維度——一個與智能體-環境交互頻率相關的維度。隨著智能體學會通過多次工具調用更有效地與環境交互,性能也相應提升,這表明多輪環境交互能力對智能體性能的重要性可能與傳統的擴展因素同等關鍵。
結語
我們正處于AI技術革命的前沿,Agent-R1不僅體現了一種技術框架,還彰顯了對AI未來發展的長遠愿景——從被動執行到主動思考,從固定路徑到自主探索。Agent-R1的推出,旨在讓更多的研究者和開發者能夠接觸到前沿的強化學習與智能體技術,推動技術的廣泛應用。只有通過開放合作,才能真正推動AI從模仿到創造,從受限到自由的飛躍。
項目地址:???https://github.com/0russwest0/Agent-R1??
本文轉載自???NLP PaperWeekly???,作者:NLP PaperWeekly
