面向AI應用開發實戰分享-基礎篇

作者：AI 2024-06-07 13:11:44

基于Langchain的AI流程編排系統，主語言Nodejs，為Langchain的每個模型類和組件類提供了可視化的低代碼組件，通過在畫布上的拖拽組件，即可完成AI的整套交付流程，組件包括Chain（進程）、Prompt、Agent Tool、Chat Module等。

引言

如果你是一名前端開發，同時又對AI開發很感興趣，那么恭喜你，機會來了。

如果不是也沒關系，同樣能幫大家了解AI應用的開發思路。

本文將帶大家從面向AI開發的基礎知識開始，再到RAG，Agent，流程編排，深入了解如何在企業內部落地AI項目。

基礎篇

一、如何面向AI交互

通常，我們使用一段文字輸入，AI模型都會基于大模型自身來進行回答，這個相信大家已經都非常了解。但是，如果想讓AI能夠基于我們所期待的內容回答，或者說是基于我們的私域信息來進行回答，我們有哪些辦法？

模型訓練
微調Fine-tuning
Prompt提示詞工程
RAG檢索增強生成

模型訓練：

通過從huggingface下載開源模型，在本地完成部署，比如最新推出的Llama 3 8B版，小模型對GPU的要求會相對低些，后通過大量的文檔資料完成模型訓練。

雖說小型模型降低了GPU的算力資源但成本也不是普通企業能承擔的，除了自身的硬件成本、模型優化的人力成本，也存在模型的汰換風險，一旦外部大廠出個大招，那我們訓練的模型就會面臨淘汰，但企業也應采取防御型戰略，先擁抱，畢竟AI已是大勢所趨，模型在應用層接口方面在開源社區里已經標準化，開發設計時模型與功能解耦，隨時替換。

微調Fine-tuning：

很多商業AI的服務模型都提供了這一能力，允許用戶針對特定的應用場景調整預訓練好的模型，以獲得更符合預期的輸出結果。

比如，你的公司有一個內部項目代號為"Project”，您希望使用LLM模型來自動生成關于"Project"的文檔或回答員工關于"Project"的查詢。但預訓練模型沒有接觸過"Project"這個術語，因此無法生成相關的準確信息。這時候就可以通過一些術語或上下文來調整模型對于這一塊的理解。

最后，微調是一種付費服務，如果未來換其他模型，你需要重新進行微調以適應新模型的特性和改進。這將再次產生計算和時間成本。

Prompt提示詞工程：

這個應該是剛接觸AI開發的同學，最先使用的，讓AI能夠按照我們的期望完成指令交付的方式。比如，讓模型盡量用中文回答。你需要準備一份包含角色、背景、技巧、輸出風格、輸出范圍等的Prompt提示詞，然后在每次通訊時攜帶在上下文里。

如果你使用chat_model（Langchain術語）方式，則會在message數組的0鍵位一直保持system prompt，如果是LLM（Langchain術語）方式，則是在每次通訊時的message字符串里包裝prompt+question，這里我們更應該基于chat_model方案來開發。

但是當你想要正式的投入到自己的項目中時，你可能會發覺Prompt非常難優化，AI并不能完全按照你的要求去執行。總結，Prompt會有以下幾個痛點：

1. 設計難度大，如果模型的輸出依賴于我們的提示詞反饋，這可能會形成一個循環，我們需要不斷地調整提示詞以獲得更好的輸出。2. 長度限制，每次通訊的message通常會包含：Prompt + n輪上下文history + 本次的question，這些內容的總文字數也是計算我們單次會話的token總成本，過長的prompt很容易使AI產生幻覺，影響回復結果。3. Prompt依然無法解決讓模型面向私域，我們公司內部的知識庫進行回答。

RAG檢索增強生成：

RAG對剛接觸的同學可能會比較抽象，借用Langchain的圖來介紹一下：

1. 首先是embedding向量存儲

我們把內部文檔在提取內容后進行切片，將內容轉為段落數組（chunk），然后傳入大模型的embed接口，模型會返回浮點數字，這個過程就是embedding，最后我們會把浮點數存入向量庫，常見的向量庫有es、faiss。

圖片

2. 接著是內容召回

輸入一個問題，先通過模型embedding把問題轉為向量數據，然后在我們的文檔庫里進行相似度搜索，召回相似度接近的數據后再交由大模型進行總結，最后返回給用戶。

以上就是RAG的整個過程，RAG是個非常考驗技術的工作，以上的流程是無法描述出RAG復雜性的，包括我們的產品在上線后，至今還在不斷嘗試如何更好的提升RAG的質量，做到能用很簡單，但要做好非常難。

后面講到內部知識庫時再來討論目前我們的方案，和線上實際效果。

引用在其他文章里看到的一句話，感同身受。

RAG涉及的內容其實廣泛，包括Embedding、分詞分塊、檢索召回（相似度匹配）、chat系統、ReAct和Prompt優化等，最后還有與LLM的交互，整個過程技術復雜度很高。如果你用的LLM非常好，反而大模型這一塊是你最不需要關心的。而這些環節里面我們每個都沒達到1（比如0.9、0.7...），那么最終的結果可能是這些小數點的乘積。

https://mp.weixin.qq.com/s/WjiOrJHt8nSW5OGe2x4BAg

二、Agent

前面主要是AI在文字內容上的交付，那如何讓AI完成工作的交付呢？

當在工作匯報時，如果能用下面這張圖來演示你的AI Agent功能，會不會很有吸引力？

（取自QCon上的一張分享圖）

目前想實現Agent，主要有以下2種方式：

ReAct自我推理

Few-shot Prompt + Thought + Action + Observation 。

通過構造一個內含工具、推理和規劃的prompt結構，模型在內部通過與提示的互動進行自我迭代和調整，以選擇適當的工具或生成更好的輸出。

例如：

{
    "messages": [
        {
            "role": "system",
            "content": "Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS."
        },
        {
            "role": "user",
            "content": "TOOLS\n------\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\n\ninfo-tool: Useful for situations where you need to retrieve content through one or more URLs from https://info.bilibili.co/. Input should be a comma-separated list in the format of \"one or more valid URLs with the domain https://info.bilibili.co/pages/viewpage.action, where the URL should include the pageId parameter\", followed by \"the information you need to summarize, or to obtain a summary\".\n\nRESPONSE FORMAT INSTRUCTIONS\n----------------------------\n\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\n\n**Option 1:**\nUse this if you want the human to use a tool.\nMarkdown code snippet formatted in the following schema:\n\n```json\n{\n    \"action\": string, // The action to take. Must be one of [info-tool]\n    \"action_input\": string // The input to the action. May be a stringified object.\n}\n```\n\n**Option #2:**\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\n\n```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\n}\n```\n\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \"```json\" and end with \"```\")!\n\n\nUSER'S INPUT\n--------------------\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\n\nhttps://info.bilibili.co/pages/viewpage.action?pageId=849684529\n這篇文章講了什么"
        }
    ]
}

我們通過Prompt告訴模型，它善于使用工具來解決問題，告訴它每一個工具的介紹，和需要填入什么參數，最后要求模型每次回復時必須遵循使用markdown code格式返回，然后我們會在Agent進程里消費返回的json-schema，是調用工具還是Final Answer。

Tool-call 代理交互

很明顯ReAct會導致我們的上下文過長，很容易造成模型在經過幾輪迭代之后不已markdown code的格式來返回內容，最終導致Agent走不下去。

tool-call的出現解決了這一問題，我們會把Prompt里這些非結構化的工具描述轉化為結構化的api字段，這樣既節省了Prompt的上下文長度，也變的容易控制。

例如：

// POST /chat/completions
{
  ...
  "tools": [
      {
        "type": "function",
        "function": {
          "name": "info-tool",
          "description": "打開一個或多個帶有pageId的xxxx網站，完成用戶需求",
          "parameters": {
            "type": "object",
            "properties": {
              "pageId": {
                "type": "number",
                "description": "請填寫網址里的pageId,多個用逗號隔開"
              },
              "task": {
                "type": "string",
                "description": "描述需求"
              }
            },
            "required": [
              "pageId",
              "task"
            ],
            "additionalProperties": false,
            "$schema": "http://json-schema.org/draft-07/schema#"
          }
        }
      },
      ...更多其他工具
  ],
  ...
}

此時，模型也會以結構化的方式告訴你他使用的工具：

// API Response
{
    ...
    "tool_calls": [
        {
            "index": 0,
            "id": "info-tool:0",
            "type": "function",
            "function": {
            "name": "info-tool",
            "arguments": "{\n    \"task\": \"獲取頁面內容\",\n    \"pageId\": 845030990\n}"
            }
        }
    ]
    ...
}

三、開發框架

再來介紹下我們選擇的技術框架，之后也會介紹其優點和不足之處。

Langchain

在許多討論AI的文章里都會提到Langchain，或者很多的開源框架都在和Langchain作比較。Langchain是一個集成了商業和開源模型，并提供了一整套工具和功能，簡化了開發、集成和部署基于語言模型的應用。

- 組件化：為使用語言模型提供抽象層，以及每個抽象層的一組實現。組件是模塊化且易于使用的，無論是否使用LangChain框架的其余部分。- 現成的鏈：結構化的組件集合，用于完成特定的高級任務。

通俗的講，它為不同的模型，不同的組件提供了統一的輸入和輸出規范。

在Chain里可以傳入[Prompt、Model、Tool、Memory（歷史會話）、OutputParser]，也能將多個model進行嵌套，讓上一個model的輸出作為下一個PromptTemplate的輸入。

目前官方提供了2種語言的版本，一個是Python，另一個是Nodejs。

Flowise

同類的還有Dify，它提供了多模型對接、RAG、任務編排、等整套的產品化方案。

Flowise更像是一個毛坯房，提供了解決方案，但所有的產品化還是需要自己開發，讀懂它，能讓你在開發Langchain時事半功倍。Dify更像豪華大別墅，大多數的功能都已經做好了產品化，內部獨立維護了與模型的api封裝，主語言Python。

Flowise中的packages介紹：

- Server：express，CRUD、完成組件庫內的實例運行- Component：JavaScript，實現Langchain類的可視化和低代碼- UI：React，AI流程編排的畫布，和一些維護頁面。

以下是一個通過Agent由AI判斷選擇使用哪些工具的編排展示，我們重新開發了Agent組件，已更適應我們的tool-call功能，在Bili Agent主進程中，組件會負責消費這些關聯了的工具。

圖片

部分代碼示例：

import { AgentExecutor } from 'langchain/agents'
 
// 將工具的配置信息轉為model接口里tools的結構化字段
// 由于對齊了接口規范,所以可以直接使用formatToOpenAITool函數
const modelWithTools = model.bind({
    tools: [...tools.map((tool: any) => formatToOpenAITool(tool))]
})
 
// 按順序組合
const runnableAgent = RunnableSequence.from([
    // 包含了用戶的指令,和將模型消息里的tool_calls format后得到的ToolMessage,和上下文聊天記錄
    // 以上這些都會輸入給prompt
    {
        [inputKey]: (i: { input: string; steps: AgentStep[] }) => i.input,
        agent_scratchpad: (i: { input: string; steps: ToolsAgentStep[] }) => formatToolAgentSteps(i.steps),
        [memoryKey]: async (_: { input: string; steps: AgentStep[] }) => {
            const messages = (await memory.getChatMessages(flowObj?.sessionId, true, chatHistory)) as BaseMessage[]
            return messages ?? []
        }
    },
    prompt,
    modelWithTools,
    new OpenAIToolsAgentOutputParser()
])
 
const executor = AgentExecutor.fromAgentAndTools({
    agent: runnableAgent,
    tools,
    returnIntermediateSteps: true,
    maxIterations: 5
})
 
executor.invoke({input: '明天是幾月幾號?'})
 
// tool_calls示例
{
    "tool_calls": [
      {
        "index": 0,
        "id": "GetDate:0",
        "type": "function",
        "function": {
          "name": "GetDate",
          "arguments": "{\n    \"task\": \"獲取明天的日期\"\n}"
        }
      }
    ]
}

最后通過Agent的配置，就可以讓模型在通用域和私域或是工具插件里自由的選擇進行聊天。

圖片

以上就是基礎篇的全部內容，至此可以發現，為什么本篇開頭會提到恭喜前端。是的，以上技術棧全部來自前端領域。

責任編輯：武曉燕來源：嗶哩嗶哩技術

AI 應用開發

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看