構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索原創精華

發布于 2024-5-23 10:35

瀏覽

1收藏

編者按： 大語言模型擁有令人驚嘆的語言理解和生成能力，卻也存在自主決策、與外部系統交互等方面的不足。函數調用（Function Calling）技術的出現，正是為解決這一難題而生的創新方案，它賦予了大語言模型更強的自主能力和與外部世界連接的能力，成為實現真正智能自主 Agent 的關鍵一環。
本期我們精心為各位讀者伙伴呈現一篇詳實的搭建技術教程，全面介紹了如何利用函數調用技術構建 Autonomous AI Agents 。作者從函數調用（Function Calling）的工作原理和應用場景出發，通過構建一個旅游服務助手的實例，層層遞進地講解了整個系統的設計思路、技術細節和代碼實現。
希望通過閱讀本文，各位讀者伙伴能獲得啟發，為開發更智能、更人性化的 AI Agents 尋得更寬廣的光明大道。

作者 | Julian Yip

編譯 | 岳揚

函數調用并非一個新鮮概念。早在 2023 年 7 月， OpenAI 就為其 GPT 模型引入了這一功能，現在這一功能也被其他競爭對手采用。比如，谷歌的 Gemini API 最近也開始支持函數調用， Anthropic 也在將其整合到 Claude 中。函數調用（譯者注：Function Calling，允許模型通過調用特定的函數來執行某些復雜任務。）已經成為大語言模型（LLMs）的關鍵功能之一，能夠顯著增強大模型應用能力。因此，學習這項技術是極其有意義的。

基于此，我打算撰寫一篇詳細的教程，內容重點為基礎介紹（因為這類教程已經很多了）之外的內容。本教程將專注于實際應用上，展示如何構建一個 fully autonomous AI agent（譯者注：能夠獨立運行和做出決策的、不需要人為干預的 AI agent 。），并將其與 Streamlit 集成來實現類似 ChatGPT 的 Web 交互界面。雖然本教程使用 OpenAI 進行演示，但本文內容同樣適用于其他支持函數調用的大語言模型，例如 Gemini。

01 函數調用（Function Calling）的用途有哪些？

Function Calling 這一技術讓開發者能夠定義函數（也被稱為工具（tools），可以將其視為模型要執行的操作，如進行數學運算或下訂單），并讓模型智能地選擇并輸出一個包含調用這些函數所需參數的 JSON 對象。簡單來說，這一技術具備以下功能：

自主決策（Autonomous decision making）：模型能夠智能地選擇所需工具來回答問題。
可靠地解析過程（Reliable parsing）：響應一般以 JSON 格式呈現，而非更典型的對話式響應（dialogue-like response）。乍看之下似乎沒什么，但正是這種技術使得 LLM 能夠通過結構化輸入（ structured inputs）連接到外部系統，比如通過 API 進行交互。

這種技術為人們帶來了各種各樣的新機遇、新機會：

Autonomous AI assistants：機器人不僅可以回答用戶咨詢的問題，還能與內部系統（譯者注：企業內部用于處理內部業務流程、數據管理、客戶關系等任務的系統）交互，處理客戶下訂單和退貨等任務。
Personal research assistants：比方說，當我們需要制定旅行計劃時，可以請這些助理在互聯網搜索內容、爬取內容、比較內容，并將結果匯總到 Excel 中。
IoT voice commands：模型可以根據檢測到的用戶意圖來控制設備或給出操作建議，例如調節空調溫度。

02 函數調用功能的運行流程

參考 Gemini 的函數調用文檔[1]，函數調用功能的運行流程如下，OpenAI 中此功能的工作原理基本相同：

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區

圖片來源：Gemini 的函數調用文檔[1]

1. 用戶向應用程序發出提示詞（prompt）

2. 應用程序會傳遞用戶提供的提示詞和函數聲明（Function Declaration(s)），即對模型所需工具的描述信息

3. 根據函數聲明，模型會給出工具選取建議和相關的請求參數。注意，模型僅會輸出建議的工具和請求參數，并不會實際調用函數

4. & 5. 應用程序根據模型響應調用相關 API

6. & 7. 將 API 的響應內容再次輸入模型，生成人類可讀的內容

8. 應用程序將最終響應返回給用戶，然后再次回到第 1 步，如此循環往復

上述的介紹內容可能看起來有些許復雜，接下來將通過實例詳細解釋該概念。

03 該 Agents 的整體設計和總體架構

在深入講解具體代碼之前，先簡要介紹一下本文介紹的這個 Agents 的整體設計和總體架構。

3.1 Solution：旅游服務助手

在本文，我們將為外出旅游的酒店顧客構建一個旅游服務助手，該產品可以使用以下工具（這些工具使得該服務助手能夠訪問外部應用程序）。

get_items 和 purchase_item：通過 API 連接到數據庫中的產品目錄（product catalog），這兩個工具分別用于獲取商品列表和進行商品購買
rag_pipeline_func：通過檢索增強生成（RAG）連接到存儲和管理文檔數據的存儲系統，以便從非結構化文本中獲取相關信息，例如酒店的宣傳冊

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區

3.2 相關技術棧

嵌入模型（Embedding model）：all-MiniLM-L6-v2[2]
向量數據庫（Vector Database）：Haystack 的 InMemoryDocumentStore[3]
大語言模型（LLM）：通過 OpenRouter 訪問 GPT-4 Turbo[4]。只要支持函數調用，稍作代碼修改即可使用其他大語言模型（如 Gemini）。
LLM 框架：使用 Haystack[5]，因為它易于使用，文檔詳盡，并且在 pipeline 的構建方面比較透明。本教程實際上是對該框架使用教程[6]的擴展。

現在開始介紹吧！

04 使用上述技術棧構建一個 Agent 樣例

4.1 前期準備工作

前往 Github[7] 克隆本項目代碼。以下內容可以在 Notebook ??function_calling_demo?? 中找到。

請創建并激活一個虛擬環境，然后運行 pip install -r requirements.txt 安裝所需的包。

4.2 項目初始化

首先連接 OpenRouter。如果有 OpenAI API 密鑰，也可以使用原始的 ??OpenAIChatGenerator??? 而不重寫覆蓋 ??api_base_url?? 參數。

import os
from dotenv import load_dotenv
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.utils import Secret
from haystack.dataclasses import ChatMessage
from haystack.components.generators.utils import print_streaming_chunk

# Set your API key as environment variable before executing this
load_dotenv()
OPENROUTER_API_KEY = os.environ.get('OPENROUTER_API_KEY')

chat_generator = OpenAIChatGenerator(api_key=Secret.from_env_var("OPENROUTER_API_KEY"),
  api_base_url="https://openrouter.ai/api/v1",
  model="openai/gpt-4-turbo-preview",
        streaming_callback=print_streaming_chunk)

接下來，我們測試 chat_generator 是否能成功調用。

chat_generator.run(messages=[ChatMessage.from_user("Return this text: 'test'")])

---------- The response should look like this ----------
{'replies': [ChatMessage(content="'test'", role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'openai/gpt-4-turbo-preview', 'index': 0, 'finish_reason': 'stop', 'usage': {}})]}

4.3 步驟 1：選擇使用合適的數據存儲方案

在此，我們將在應用程序和兩個數據源（data sources）之間建立連接：用于非結構化文本的文檔存儲系統（Document store），以及通過 API 連接的應用程序數據庫（application database via API）。

使用 Pipeline 給文檔編制索引

需要給系統提供文本樣本（sample texts），以供模型進行檢索增強生成（RAG）。這些文本將被轉換為嵌入（embeddings），并使用將文檔數據存儲在內存中的數據存儲方案。

from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

# Sample documents
documents = [
    Document(content="Coffee shop opens at 9am and closes at 5pm."),
    Document(content="Gym room opens at 6am and closes at 10pm.")
]

# Create the document store
document_store = InMemoryDocumentStore()

# Create a pipeline to turn the texts into embeddings and store them in the document store
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(
 "doc_embedder", SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
)
indexing_pipeline.add_component("doc_writer", DocumentWriter(document_store=document_store))

indexing_pipeline.connect("doc_embedder.documents", "doc_writer.documents")

indexing_pipeline.run({"doc_embedder": {"documents": documents}})

上述程序的輸出結果應該與輸入的示例文檔數據保持一致：

{'doc_writer': {'documents_written': 2}}

啟動 API 服務進程

在 db_api.py 文件中創建一個用 Flask 框架構建的 API 服務，用于連接 SQLite 數據庫。請在終端運行 python db_api.py，啟動該服務。

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區

如果服務成功執行，終端將顯示圖中所示的信息

我注意到在 db_api.py 中預置一些初始的基礎數據。

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區

數據庫中的數據樣本

4.4 步驟 2：定義函數（Define the functions）

這一步是在準備真正的函數，以供模型在之后的函數調用（Function Calling）步驟中調用執行。（如 02 節 “函數調用功能的運行流程” 中所述的步驟 4-5）

RAG 函數（RAG function）

其中之一就是 RAG 函數 ??rag_pipeline_func??。這個函數的作用是讓模型能夠搜索之前存儲在文檔存儲中的文本內容，并基于搜索結果提供答案。它首先使用 Haystack 這個框架。將 RAG （檢索增強生成）的檢索過程定義為一個 pipeline 。

from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

template = """
Answer the questions based on the given context.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""
rag_pipe = Pipeline()
rag_pipe.add_component("embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
rag_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
rag_pipe.add_component("prompt_builder", PromptBuilder(template=template))
# Note to llm: We are using OpenAIGenerator, not the OpenAIChatGenerator, because the latter only accepts List[str] as input and cannot accept prompt_builder's str output
rag_pipe.add_component("llm", OpenAIGenerator(api_key=Secret.from_env_var("OPENROUTER_API_KEY"),
  api_base_url="https://openrouter.ai/api/v1",
  model="openai/gpt-4-turbo-preview"))

rag_pipe.connect("embedder.embedding", "retriever.query_embedding")
rag_pipe.connect("retriever", "prompt_builder.documents")
rag_pipe.connect("prompt_builder", "llm")

測試函數功能是否正常工作。

query = “When does the coffee shop open?”
rag_pipe.run({"embedder": {"text": query}, "prompt_builder": {"question": query}})

rag_pipeline_func 函數在被模型調用執行后，應該會產生如下輸出。請注意，模型給出的回答來自于我們之前提供的樣本文檔數據。

{'llm': {'replies': ['The coffee shop opens at 9am.'],
 'meta': [{'model': 'openai/gpt-4-turbo-preview',
 'index': 0,
 'finish_reason': 'stop',
 'usage': {'completion_tokens': 9,
 'prompt_tokens': 60,
 'total_tokens': 69,
 'total_cost': 0.00087}}]}}

然后，我們可以將 rag_pipe 轉化為一個函數，在需要時調用 rag_pipeline_func(query) 獲取基于 query 的答案，而不會返回其他的中間細節信息。

def rag_pipeline_func(query: str):
    result = rag_pipe.run({"embedder": {"text": query}, "prompt_builder": {"question": query}})

 return {"reply": result["llm"]["replies"][0]}

定義與數據庫進行交互的 API

在此處，我們定義與數據庫進行交互的 ??get_items??? 函數和 ??purchase_itemfunctions?? 函數。

# Flask's default local URL, change it if necessary
db_base_url = 'http://127.0.0.1:5000'

# Use requests to get the data from the database
import requests
import json

# get_categories is supplied as part of the prompt, it is not used as a tool
def get_categories():
    response = requests.get(f'{db_base_url}/category')
    data = response.json()
 return data

def get_items(ids=None,categories=None):
    params = {
 'id': ids,
 'category': categories,
 }
    response = requests.get(f'{db_base_url}/item', params=params)
    data = response.json()
 return data

def purchase_item(id,quantity):

    headers = {
 'Content-type':'application/json', 
 'Accept':'application/json'
 }

    data = {
 'id': id,
 'quantity': quantity,
 }
    response = requests.post(f'{db_base_url}/item/purchase', json=data, headers=headers)
 return response.json()

定義工具函數列表

現在我們已經成功完成函數的定義，接下來需要讓模型識別并了解如何使用這些函數，為此我們需要為這些函數提供一些描述說明內容。

由于我們在此處使用的是 OpenAI，所以需要按照 OpenAI 要求的格式[8]來描述這些 tools（函數）。

tools = [
 {
 "type": "function",
 "function": {
 "name": "get_items",
 "description": "Get a list of items from the database",
 "parameters": {
 "type": "object",
 "properties": {
 "ids": {
 "type": "string",
 "description": "Comma separated list of item ids to fetch",
 },
 "categories": {
 "type": "string",
 "description": "Comma separated list of item categories to fetch",
 },
 },
 "required": [],
 },
 }
 },
 {
 "type": "function",
 "function": {
 "name": "purchase_item",
 "description": "Purchase a particular item",
 "parameters": {
 "type": "object",
 "properties": {
 "id": {
 "type": "string",
 "description": "The given product ID, product name is not accepted here. Please obtain the product ID from the database first.",
 },
 "quantity": {
 "type": "integer",
 "description": "Number of items to purchase",
 },
 },
 "required": [],
 },
 }
 },
 {
 "type": "function",
 "function": {
 "name": "rag_pipeline_func",
 "description": "Get information from hotel brochure",
 "parameters": {
 "type": "object",
 "properties": {
 "query": {
 "type": "string",
 "description": "The query to use in the search. Infer this from the user's message. It should be a question or a statement",
 }
 },
 "required": ["query"],
 },
 },
 }
]

4.5 步驟 3：將所有系統組件整合在一起

我們現在已經準備好了測試函數調用（Function Calling）功能所需的所有系統組件！這一步驟我們需要做以下幾件事：

為模型提供初始提示詞（prompt），并為其提供上下文
提供樣例用戶消息，模擬真實用戶的 query 或需求
將之前定義的工具函數列表（tool list）作為 ??tools?? 參數傳遞給 chat generator （譯者注：生成對話式回復的語言模型或 AI 系統），這是最關鍵的一步。

# 1. Initial prompt
context = f"""You are an assistant to tourists visiting a hotel.
You have access to a database of items (which includes {get_categories()}) that tourists can buy, you also have access to the hotel's brochure.
If the tourist's question cannot be answered from the database, you can refer to the brochure.
If the tourist's question cannot be answered from the brochure, you can ask the tourist to ask the hotel staff.
"""
messages = [
    ChatMessage.from_system(context),
 # 2. Sample message from user
    ChatMessage.from_user("Can I buy a coffee?"),
 ]

# 3. Passing the tools list and invoke the chat generator
response = chat_generator.run(messages=messages, generation_kwargs= {"tools": tools})
response

---------- Response ----------
{'replies': [ChatMessage(content='[{"index": 0, "id": "call_AkTWoiJzx5uJSgKW0WAI1yBB", "function": {"arguments": "{\"categories\":\"Food and beverages\"}", "name": "get_items"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'openai/gpt-4-turbo-preview', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {}})]}

現在讓我們來檢查一下模型響應內容。

需要注意的是，函數調用（Function Calling）所返回的內容，不僅包括模型選擇調用的函數本身，還應該包括為調用該函數所傳入的參數。

function_call = json.loads(response["replies"][0].content)[0]
function_name = function_call["function"]["name"]
function_args = json.loads(function_call["function"]["arguments"])
print("Function Name:", function_name)
print("Function Arguments:", function_args)

---------- Response ----------
Function Name: get_items
Function Arguments: {‘categories’: ‘Food and beverages’}

當模型遇到另一個新問題時，會分析該問題，結合它已有的上下文信息，評估哪一個可用的工具函數最能夠幫助回答這個問題。

# Another question
messages.append(ChatMessage.from_user("Where's the coffee shop?"))

# Invoke the chat generator, and passing the tools list
response = chat_generator.run(messages=messages, generation_kwargs= {"tools": tools})
function_call = json.loads(response["replies"][0].content)[0]
function_name = function_call["function"]["name"]
function_args = json.loads(function_call["function"]["arguments"])
print("Function Name:", function_name)
print("Function Arguments:", function_args)

---------- Response ----------
Function Name: rag_pipeline_func
Function Arguments: {'query': "Where's the coffee shop?"}

請再次注意，這一步驟實際上還沒有真正調用執行任何函數，真正執行函數調用，將是我們接下來這個步驟要做的。

調用函數

這一步驟，我們需要將參數輸入所選函數：

## Find the correspoding function and call it with the given arguments
available_functions = {"get_items": get_items, "purchase_item": purchase_item,"rag_pipeline_func": rag_pipeline_func}
function_to_call = available_functions[function_name]
function_response = function_to_call(**function_args)
print("Function Response:", function_response)

---------- Response ----------
Function Response: {'reply': 'The provided context does not specify a physical location for the coffee shop, only its operating hours. Therefore, I cannot determine where the coffee shop is located based on the given information.'}

然后，我們可以將來自 ??rag_pipeline_func??? 的模型響應結果，作為上下文信息附加到 ??messages?? 變量中，從而讓模型基于這個附加的上下文，生成最終的答復。

messages.append(ChatMessage.from_function(content=json.dumps(function_response), name=function_name))
response = chat_generator.run(messages=messages)
response_msg = response["replies"][0]

print(response_msg.content)

---------- Response ----------
For the location of the coffee shop within the hotel, I recommend asking the hotel staff directly. They will be able to guide you to it accurately.

現在已經完成了一個完整的用戶與 AI 的對話循環！

4.6 步驟 4：將其轉化為實時交互式對話（interactive chat）系統

上面的代碼展示了函數調用是如何實現的，但我們想更進一步，將其轉化為實時交互式對話（interactive chat）系統。

在本節，我展示了兩種實現方式：

較為原始的 input() 方法，將對話內容打印到 notebook 中。
通過 Streamlit 進行渲染，提供類似 ChatGPT 的 UI 體驗。

input() loop

這部分代碼是從 Haystack 的教程[9]中復制過來的，我們可以通過它快速測試模型。請注意：該應用程序是為了演示函數調用（Function Calling）這一概念而創建的，并非意味著此應用程序的健壯性完美，例如：支持同時對多個項目進行排序、無幻覺等。

import json
from haystack.dataclasses import ChatMessage, ChatRole

response = None
messages = [
    ChatMessage.from_system(context)
]

while True:
 # if OpenAI response is a tool call
 if response and response["replies"][0].meta["finish_reason"] == "tool_calls":
        function_calls = json.loads(response["replies"][0].content)

 for function_call in function_calls:
 ## Parse function calling information
            function_name = function_call["function"]["name"]
            function_args = json.loads(function_call["function"]["arguments"])

 ## Find the correspoding function and call it with the given arguments
            function_to_call = available_functions[function_name]
            function_response = function_to_call(**function_args)

 ## Append function response to the messages list using `ChatMessage.from_function`
            messages.append(ChatMessage.from_function(content=json.dumps(function_response), name=function_name))

 # Regular Conversation
 else:
 # Append assistant messages to the messages list
 if not messages[-1].is_from(ChatRole.SYSTEM):
            messages.append(response["replies"][0])

        user_input = input("ENTER YOUR MESSAGE ?? INFO: Type 'exit' or 'quit' to stop\n")
 if user_input.lower() == "exit" or user_input.lower() == "quit":
 break
 else:
            messages.append(ChatMessage.from_user(user_input))

    response = chat_generator.run(messages=messages, generation_kwargs={"tools": tools})

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區

在集成開發環境中運行交互式聊天 App

盡管基本的交互方式也可以運行使用，但擁有一個更加美觀友好的用戶界面會讓用戶體驗更加出色。

Streamlit 界面

Streamlit 能夠將 Python 腳本和 Web 開發技術優雅地結合，轉化為可共享使用的 Web 服務應用，為這個函數調用交互式應用程序構建了一個全新的 Web 界面。上述代碼已被改編成一個 Streamlit 應用，位于代碼倉庫的 streamlit 文件夾中。

我們可以通過以下步驟運行該應用：

如果還未運行，請使用 ??python db_api.py?? 啟動 API 服務器。
將 ??OPENROUTER_API_KEY??? 設置為環境變量，例如在 Linux 上或使用 ??git bash??? 時，執行 ??export OPENROUTER_API_KEY='@替換為您的API密鑰'??。
在終端中進入 streamlit 文件夾，目錄切換命令為 ??cd streamlit??。
運行 ??streamlit run app.py?? 啟動 Streamlit。瀏覽器應該會自動創建一個新的標簽頁，運行該應用程序。

基本上我想介紹的內容就是這些了！真心希望大家能夠喜歡這篇文章。

構建 Autonomous AI Agent ｜函數調用（Function Calling）技術實例探索-AI.x社區