從“無法找到答案”到“一問一個準”! Contextual Embedding讓chunk自帶上下文，精準召回，效果立竿見影！原創

發布于 2025-3-25 10:23

瀏覽

1收藏

背景

最近，公司的一個項目經理找我聊了個頭疼的問題：他們給外部交付的項目POC效果不太理想，他發現從向量庫中檢索不到想要的信息。起初，我建議他換個更好的embedding模型，別再用??text-embedding-ada-002???了。結果他反饋說，試了??text-embedding-3-large???和??bge-m3??，效果也沒啥顯著提升。

我仔細看了他們的數據，發現他們上傳了大量用戶的文檔，并對文檔進行了切分，分成一個個??chunk???，然后召回這些??chunk???送給LLM生成回答。問題就在于他們切分chunk的方式用的是RecursiveCharacterTextSplitter，單獨看一個切分后的??chunk???，根本不知道它在講什么。比如，有個??chunk???提到了??opening hours???，但因為遞歸切分的原因，缺少了主體信息。結果，即使召回了這個??chunk??，LLM也會回復“從提供的上下文中無法找到答案”。

我給了他一個建議：可以試試??contextual-embedding???。引入這個方案不需要太多開發成本，而且配合??prompt cache??，還能有效減少LLM調用的開銷。

什么是contextual embedding

在傳統的RAG中，文檔通常被分成更小的塊以進行有效的檢索。雖然這種方法對于許多應用程序都很有效，但當單個塊缺乏足夠的上下文時，它可能會導致問題。Contextual Embedding 通過使用LLM給每段chunk補充上下文信息，用戶更精準召回和更高質量的回答。

舉個簡單的例子，比如有段chunk的內容如下：

The company's revenue grew by 3% over the previous quarter.

當我們提問："What was the revenue growth for ACME Corp in Q2 2023?"，雖然這段chunk是真實答案，但是卻檢索不到。這是因為原始chunk，是兩個"The"對象，導致不管使用embedding還是BM25都抓不出來它。但是如果我們通過某種手段，給轉換成下面這種contextualized_chunk，把上下文信息給注入到chunk里:

This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter.

那么還是剛才的問題，就一問一個準了。

原理

從“無法找到答案”到“一問一個準”! Contextual Embedding讓chunk自帶上下文，精準召回，效果立竿見影！-AI.x社區

我們通過一個特定的提示生成每個分塊的簡潔上下文，生成的上下文通常為 50-100 個 token，然后索引之前將其添加到分塊中。對應的prompt示例:

system prompt

Here is the whole document: 
<document> 
{{WHOLE_DOCUMENT}} 
</document>

user prompt

Here is the chunk we want to situate within the whole document:
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

實戰案例

這里我們還是以sentosa的一個網頁為例：https://www.sentosa.com.sg/en/places-to-stay/amara-sanctuary-sentosa/。

從這個網頁中，我們切分出了一段??chunk??，內容如下：

description: Bed: King 
Room Size: 37 sqm 
Maximum Occupants: 2 Adults or 2 Adults and 1 child age 11 and below 
Room Essentials  
Flat-screen TV with cable channel access  
Individually controlled air-conditioning  
Spacious bathroom with separate bathtub and shower facilities  
Luxury bathroom amenities  
Bathrobes and hair dryer  
Electronic safe  
Tea and coffee making facilities  
Iron and ironing board  
Baby cot is available on request (subject to availability)

name: Couple Suite

name: Courtyard Suite

name: Junior Suite

name: Verandah Suite

Opening Hours: 
Check in: from 3pm  
Check out: until 12pm

單獨看這段??chunk???，我們只能看出它在描述一些房間信息，但具體是哪些房間的信息，卻并不清楚。于是，我們使用??gpt-4o-mini???為這段??chunk??生成了上下文，結果如下：

This chunk provides detailed information about the room types and amenities available at Amara Sanctuary Sentosa, including the Deluxe Room specifications, other suite options, opening hours, accessibility features, and pet-friendly services, enhancing the overall description of the resort's accommodations.

接下來，我們將原始的??chunk???和生成的上下文結合起來(使用\n\n 連接chunk)，形成一個新的??chunk??：

description: Bed: King 
Room Size: 37 sqm 
Maximum Occupants: 2 Adults or 2 Adults and 1 child age 11 and below 
Room Essentials  
Flat-screen TV with cable channel access  
Individually controlled air-conditioning  
Spacious bathroom with separate bathtub and shower facilities  
Luxury bathroom amenities  
Bathrobes and hair dryer  
Electronic safe  
Tea and coffee making facilities  
Iron and ironing board  
Baby cot is available on request (subject to availability)

name: Couple Suite

name: Courtyard Suite

name: Junior Suite

name: Verandah Suite




Opening Hours: 
Check in: from 3pm  
Check out: until 12pm


This chunk provides detailed information about the room types and amenities available at Amara Sanctuary Sentosa, including the Deluxe Room specifications, other suite options, opening hours, accessibility features, and pet-friendly services, enhancing the overall description of the resort's accommodations.

這樣一來，當我們再詢問關于“Amara Sanctuary Sentosa的Deluxe Room”相關問題時，LLM就能準確回答上來了。這種方法不僅提升了信息的連貫性，還大大減少了LLM的誤判率。

prompt cache

對于OpenAI模型，當你的提示（prompt）長度超過1,024個token時，API調用將自動受益于Prompt Caching功能。(deepseek也支持prompt cache)如果你重復使用具有相同前綴的提示，系統會自動應用Prompt Caching折扣，而你無需對API集成做任何修改。緩存通常在5-10分鐘的不活動后被清除，并且無論如何都會在最后一次使用后的一小時內被移除。

當我們對某個文檔進行切分，生成多個??chunk???時，通常需要為每個??chunk??生成上下文信息。如果每次調用都傳入全部文檔信息，會導致重復計算，增加LLM的調用成本。這時，我們可以將全部文檔信息放在system prompt中，利用Prompt Cache來節省費用。

以下是我調用LLM的Response中的??usage??字段，展示了Prompt Cache的實際效果：

CompletionUsage(
    completion_tokens=24, 
    prompt_tokens=1584, 
    total_tokens=1608, 
    completion_tokens_details=CompletionTokensDetails(
        accepted_prediction_tokens=0, 
        audio_tokens=0, 
        reasoning_tokens=0, 
        rejected_prediction_tokens=0
    ), 
    prompt_tokens_details=PromptTokensDetails(
        audio_tokens=0, 
        cached_tokens=1536  # 這里顯示有1,536個token被緩存
    )
)

從上面的數據可以看出：

prompt_tokens: 1,584個token被用于提示。
cached_tokens: 1,536個token被緩存，這意味著這部分token的計算成本被節省了下來。
completion_tokens: 24個token用于生成回答。

通過將文檔信息放在system prompt中，我們成功利用Prompt Cache減少了重復計算，顯著降低了LLM的調用成本。

總結

傳統的文檔切分方法（如RecursiveCharacterTextSplitter）可能會導致chunk缺乏足夠的上下文信息，從而影響檢索效果。通過引入Contextual Embedding，我們能夠為每個chunk補充上下文信息，顯著提升檢索的精準度和回答的質量。

總的來說，Contextual Embedding和Prompt Cache的結合，為RAG系統提供了一種低成本、高效率的優化方案。尤其是在項目時間緊張、資源有限的情況下，這種方案能夠快速提升系統的表現。

本文轉載自公眾號AI 博物院作者：longyunfeigu

原文鏈接：??https://mp.weixin.qq.com/s/I8muNOkLenngFn9I9U2ZQg??

?著作權歸作者所有，如需轉載，請注明出處，否則將追究法律責任

標簽

RAG

人工智能

贊

回復

舉報

社區頭條

回復

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

從“無法找到答案”到“一問一個準”! Contextual Embedding讓chunk自帶上下文，精準召回，效果立竿見影！原創

背景

什么是contextual embedding

原理

實戰案例

prompt cache

總結

目錄

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

從“無法找到答案”到“一問一個準”! Contextual Embedding讓chunk自帶上下文，精準召回，效果立竿見影！ 原創

背景

什么是contextual embedding

原理

實戰案例

prompt cache

總結

目錄

從“無法找到答案”到“一問一個準”! Contextual Embedding讓chunk自帶上下文，精準召回，效果立竿見影！原創