多模態RAG應用開發實戰演練原創

發布于 2024-10-29 11:42

瀏覽

0收藏

本文將介紹如何基于高級解析、語義和關鍵字搜索以及重排序技術開發支持上下文檢索的多模態RAG應用系統。

引言

當下，所有大型語言模型（LLM）都存在一個知識截止日期的問題，即它們無法回答針對其知識庫中不存在的特定數據的查詢。例如，LLM無法回答有關公司去年會議紀要數據的查詢。另一方面，LLM還容易產生幻覺，并提供看似合理的錯誤答案。

為了克服這個問題，檢索增強生成（RAG）解決方案越來越受歡迎。RAG的主要思想是將外部文檔整合到大型語言模型中，并指導其行為僅從外部知識庫中回答問題。具體地說，這是通過將文檔分塊為更小的塊，計算每個塊的嵌入（數值表示），然后將嵌入作為索引存儲在專門的向量數據庫中來實現的。

多模態RAG應用開發實戰演練-AI.x社區

RAG工作流程示意圖——查詢被轉換為嵌入，通過檢索模型與向量數據庫匹配，并與檢索到的數據相結合，最終通過大型語言模型產生響應。

上下文檢索RAG

將用戶的查詢與向量數據庫中的小塊進行匹配的過程通常效果良好；然而，它還存在以下問題：

一個問題的答案可能需要多個彼此相距甚遠的塊。由于上下文丟失，無法找到所有相關的塊。例如，考慮一個法律文件的問題：“阿爾法和貝塔公司之間終止合伙關系的條件是什么？”文件中的一個部分可能是“協議可能會在特定條件下終止”。然而，由于缺乏任何上下文信息（沒有公司名稱），在檢索過程中無法選擇此塊。
對于某些問題，傳統的最佳匹配搜索比語義搜索更有效，尤其是對于精確匹配而言。例如，在電子商務文檔中，通過語義搜索方法對查詢“什么是產品ID ZX-450？”的答案可能會帶來有關多個產品的信息，而缺少確切的“ZX-450”產品。
從向量數據庫檢索到的信息被轉發到LLM，LLM根據查詢生成最終答案。在此過程中，LLM必須確定最合適的塊來生成最終答案。檢索到的塊太多可能會導致響應中出現不相關的信息。因此，LLM必須有一個排序機制。

為了應對這些問題，Anthropic公司最近引入了??一種向每個塊添加上下文的方法???；與原始RAG相比，該方法的性能有了顯著提高。在將文檔拆分為塊后，該方法首先將塊與整個文檔作為上下文一起發送到LLM，為每個塊分配一個簡短的上下文。隨后，上下文附加的塊被保存到向量數據庫中。它們進一步使用??bm25檢索器??將上下文分塊與最佳匹配相結合，該檢索器使用bm25方法搜索文檔，并使用一個重新排序模型，該模型根據相關性為每個檢索到的塊分配評分。

具有上下文檢索的多模態RAG

盡管性能有了顯著提高，但Anthropic公司僅證明了這些方法對文本類型數據的適用性。但當今世界中，許多文檔中豐富的信息的來源包括圖像（圖形、圖形）和復雜的表格，等等。如果我們只解析文檔中的文本，我們將無法深入了解文檔中的其他模式。因此，包含圖像和復雜表格的文檔需要高效的解析方法，這不僅需要從文檔中正確提取它們，還需要理解它們。

使用Anthropic公司的最新模型（claude-3-5-connect-20240620）為文檔中的每個塊分配上下文在大型文檔的情況下可能會涉及高成本，因為它涉及將整個文檔與每個塊一起發送。盡管??Claude模型的提示緩存技術??可以通過在API調用之間緩存頻繁使用的上下文來顯著降低這一成本，但其成本仍遠高于OpenAI公司的成本高效模型，如gpt-4o-mini。

本文旨在探討針對上述Anthropic公司方法的進一步擴展，如下所示：

使用??LlamaParse??將所有內容（從文本到表格再到圖像）提取到結構良好的markdown格式的文檔中。
通過節點解析器將文檔解析為節點，而不是使用文本拆分器將文檔拆分為塊。這不僅涉及拆分文本，還涉及理解文檔的結構、語義和元數據等任務。
OpenAI公司極具成本效益的大型語言模型gpt-4o-mini和嵌入模型text-embedding-3-small用于為每個節點分配上下文、生成最終響應和計算節點的嵌入。

在了解了Anthropic公司關于上下文檢索的??博客文章???之后，我在??GitHub鏈接???上找到了OpenAI公司的部分實現。然而，它使用傳統的分塊和LlamaParse方法，沒有最近推出的??高級模式??。我發現Llamaparse的高級模式在提取文檔中的不同結構方面非常有效。

Anthropic公司的上下文檢索實現也可以在GitHub上找到，它使用了LlamaIdex抽象；然而，它沒有實現多模態解析。在撰寫本文時，LlamaIdex提供了一個更新的??實現???，它使用了多模態解析和上下文檢索。該實現使用了Anthropic公司的LLM（claude-3–5-connect-2024062）和Voyage公司的嵌入模型（??Voyage-3??）。然而，它們并沒有像Anthropic公司的博客文章中提到的那樣探索BM25（Best Matching 25）排序算法和重排序（Reranking）技術。

本文討論的上下文檢索實現是一種低成本、多模態的RAG解決方案，通過BM25搜索和重新排序提高了檢索性能。還將這種基于上下文檢索的多模態RAG（CMRAG）的性能與基本RAG和LlamaIdex的上下文檢索實現進行了比較。

下面4個鏈接中重新使用了這其中的一些功能，并進行了必要的修改。

??1.https://colab.research.google.com/drive/1PcuVqUQjacMt18p8LwODnjbsXOFMurwa?usp=sharing#scrollTo=s-bxSMSa-qJe??

2.??https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/contextual_retrieval.ipynb??

3.??https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_contextual_retrieval_rag.ipynb??

4.??https://github.com/lesteroliver911/contextual-doc-retrieval-opneai-reranker?tab=readme-ov-file??

此實現的源代碼可在??GitHub??上獲得。

本文中用于實現基于上下文檢索的多模態RAG（以下簡稱“CMRAG”）的總體方法示意圖如下所示：

多模態RAG應用開發實戰演練-AI.x社區

解析后的節點在保存到向量數據庫之前會被分配上下文。上下文檢索涉及結合嵌入（語義搜索）和TF-IDF向量（最佳匹配搜索），然后通過重新排序器模型進行重新排序，最后由LLM生成響應。

接下來，讓我們深入研究一下CMRAG的分步實現。

多模態解析

首先，需要安裝以下依賴庫才能運行本文中討論的代碼。

!pip install llama-index ipython cohere rank-bm25 pydantic nest-asyncio python-dotenv openai llama-parse

GitHub筆記本文件中也提到了所有需要導入才能運行整個代碼的依賴庫。在這篇文章中，我使用了??芬蘭移民關鍵數據??（根據CC By 4.0許可，允許重復使用），其中包含幾個圖表、圖像和文本數據。

LlamaParse使用商業性質的多模態模型（如gpt-4o）提供??多模態解析??來處理文檔提取。

parser = LlamaParse(
use_vendor_multimodal_model=True
vendor_multimodal_model_name="openai-gpt-4o"
vendor_multimodal_api_key=sk-proj-xxxxxx
)

在這種模式下，會對文檔的每一頁進行截圖，然后將截圖發送到多模態模型，并附上提取標記的指令。每頁的標記結果被合并到最終輸出中。

最近的??LlamaParse高級模式???提供了先進的多模態文檔解析支持，能夠將文本、表格和圖像提取到結構良好的標記中，同時顯著減少了缺失的內容和幻覺。它可以通過在??Llama云平臺??創建一個免費賬號并獲得API密鑰來使用。免費計劃提供每天解析1000個頁面。

LlamaParse高級模式的使用方式如下：

from llama_parse import LlamaParse
import os

# 此函數負責從指定目錄下讀取所有文件
def read_docs(data_dir) -> List[str]:
files = []
for f in os.listdir(data_dir):
fname = os.path.join(data_dir, f)
if os.path.isfile(fname):
files.append(fname)
return files

parser = LlamaParse(
result_type="markdown",
premium_mode=True,
api_key=os.getenv("LLAMA_CLOUD_API_KEY")
)

files = read_docs(data_dir = DATA_DIR)

在上述代碼中，我們首先從指定目錄讀取文檔，使用解析器的get_json_result()方法解析文檔，并使用解析器的get_images()方法獲取圖像字典。隨后，提取節點并將其發送到LLM，以使用retrieve_nodes()方法根據整個文檔分配上下文。解析這份文檔（60頁），包括獲取圖像詞典等內容，共計耗時5分34秒（一次性過程）。

print("Parsing...")
json_results = parser.get_json_result(files)
print("Getting image dictionaries...")
images = parser.get_images(json_results, download_path=image_dir)
print("Retrieving nodes...")

多模態RAG應用開發實戰演練-AI.x社區

報告第四頁（來源：芬蘭移民關鍵數據）

json_results[0]["pages"][3]

多模態RAG應用開發實戰演練-AI.x社區

報告中的第四頁由JSON結果的第一個節點表示（按作者排列的圖像）

上下文檢索

通過retrieve_nodes()函數從解析的josn_results中提取單個節點和相關圖像（屏幕截圖）。每個節點與所有節點（以下代碼中的doc變量）一起被發送到_assign_context()函數。_assign_context()函數使用提示模板??context_prompt_TMPL??（來自鏈接，并經過修改后采用）為每個節點添加簡潔的上下文。通過這種方式，我們將元數據、標記文本、上下文和原始文本集成到節點中。

以下代碼顯示了retrieve_nodes()函數的實現。兩個輔助函數_get_sorted_image_files()和get_img_page_number()分別按頁面和圖像的頁碼獲取排序后的圖像文件。總體目標不是像簡單的RAG那樣僅依賴原始文本來生成最終答案，而是考慮元數據、標記文本、上下文和原始文本，以及檢索到的節點的整個圖像（屏幕截圖）（節點元數據中的圖像鏈接）來生成最終響應。

# 針對文件名使用正則表達式獲取圖像所在的頁碼
def get_img_page_number(file_name):
match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
if match:
return int(match.group(1))
return 0

#獲取按頁排序的圖像文件
def _get_sorted_image_files(image_dir):
raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
sorted_files = sorted(raw_files, key=get_img_page_number)
return sorted_files

#針對上下文塊的上下文提示模板
CONTEXT_PROMPT_TMPL = """
You are an AI assistant specializing in document analysis. Your task is to provide brief, relevant context for a chunk of text from the given document.
Here is the document:
<document>
{document}
</document>

Here is the chunk we want to situate within the whole document:
<chunk>
{chunk}
</chunk>

Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
1. Identify the main topic or concept discussed in the chunk.
2. Mention any relevant information or comparisons from the broader document context.
3. If applicable, note how this information relates to the overall theme or purpose of the document.
4. Include any key figures, dates, or percentages that provide important context.
5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.

Please give a short succinct context to situate this chunk within the overall document to improve search retrieval of the chunk. 
Answer only with the succinct context and nothing else.

Context:
"""

CONTEXT_PROMPT = PromptTemplate(CONTEXT_PROMPT_TMPL)

#下面的函數針對每一個塊生成上下文
def _assign_context(document: str, chunk: str, llm) -> str:
prompt = CONTEXT_PROMPT.format(document=document, chunk=chunk)
response = llm.complete(prompt)
context = response.text.strip()
return context

#下面函數使用上下文生成文本節點
def retrieve_nodes(json_results, image_dir, llm) -> List[TextNode]:
nodes = []
for result in json_results:
json_dicts = result["pages"]
document_name = result["file_path"].split('/')[-1]
docs = [doc["md"] for doc in json_dicts]  # 提取文字信息
image_files = _get_sorted_image_files(image_dir)  #提取圖像信息
# 連接所有文檔以創建完整的文件文字內容
document_text = "\n\n".join(docs)
for idx, doc in enumerate(docs):
# 針對每個塊（頁）生成上下文
context = _assign_context(document_text, doc, llm)
# 把文檔內容與初始塊結合到一起
contextualized_content = f"{context}\n\n{doc}"
# 使用上下文化后的內容生成文本節點
chunk_metadata = {"page_num": idx + 1}
chunk_metadata["image_path"] = str(image_files[idx])
chunk_metadata["parsed_text_markdown"] = docs[idx]

node = TextNode(
text=contextualized_content,
metadata=chunk_metadata,
)
nodes.append(node)
return nodes
#取得文本節點
text_node_with_context = retrieve_nodes(json_results, image_dir, llm)First page of the report (image by author)First page of the report (image by author)

下面給出的是與報告第一頁對應的節點的描述。

多模態RAG應用開發實戰演練-AI.x社區

添加了上下文和元數據的節點（圖片由作者提供）

用BM25增強上下文檢索并重新排序

所有具有元數據、原始文本、標記文本和上下文信息的節點都被索引到向量數據庫中。節點的BM25索引被創建并保存在pickle文件中，用于查詢推理。處理后的節點也會被保存，以供以后使用（text_node_with_context.pkl）。

# 創建向量存儲牽引
index = VectorStoreIndex(text_node_with_context, embed_model=embed_model)
index.storage_context.persist(persist_dir=output_dir)
# 構建BM25索引
documents = [node.text for node in text_node_with_context]
tokenized_documents = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_documents)
# 保存bm25和text_node_with_context
with open(os.path.join(output_dir, 'tokenized_documents.pkl'), 'wb') as f:
pickle.dump(tokenized_documents, f)
with open(os.path.join(output_dir, 'text_node_with_context.pkl'), 'wb') as f:
pickle.dump(text_node_with_context, f)

現在，我們可以初始化一個查詢引擎，使用以下管道進行查詢。但在此之前，設置以下提示以指導LLM生成最終響應的行為。初始化多模態LLM（gpt-4o-mini）以生成最終響應。此提示可根據需要進行調整。

# 定義QA 提示模板
RAG_PROMPT = """\
Below we give parsed text from documents in two different formats, as well as the image.

---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query. Generate the answer by analyzing parsed markdown, raw text and the related
image. Especially, carefully analyze the images to look for the required information.
Format the answer in proper format as deems suitable (bulleted lists, sections/sub-sections, tables, etc.)
Give the page's number and the document name where you find the response based on the Context.

Query: {query_str}
Answer: """

PROMPT = PromptTemplate(RAG_PROMPT)

#初始化多模態LLM
MM_LLM = OpenAIMultiModal(model="gpt-4o-mini", temperature=0.0, max_tokens=16000)

在查詢引擎中集成整個管道流程

本節中要介紹的QueryEngine類實現了上述完整的工作流程。BM25搜索中的節點數量（top_n_BM25）和重新排序器重新排序的結果數量（top_name）可以根據需要進行調整。通過切換GitHub代碼中的best_match_25和re_ranking變量，可以選擇或取消選擇BM25搜索和重排序。

下面給出的是QueryEngine類實現的整體工作流程：

1. 查找查詢嵌入。

2. 使用基于向量的檢索從向量數據庫中檢索節點。

3. 使用BM25搜索檢索節點（如果選擇使用該方法的話）。

4. 結合BM25和基于向量的檢索中的節點。查找節點的唯一數量（刪除重復的節點）。

5. 應用重排序對組合結果進行重排序（如果選中該方法的話）。在這里，我們使用Cohere公司的rerank-english-v2.0重新排序模型。您可以在Cohere公司的??網站??上創建一個賬號，以獲得試用版API密鑰。

6. 從與節點關聯的圖像創建圖像節點。

7. 根據解析的markdown文本創建上下文字符串。

8. 將節點圖像發送到多模態LLM進行解釋。

9. 通過將文本節點、圖像節點描述和元數據發送到LLM來生成最終響應。

#定義類QueryEngine，把所有方法集成到一起
class QueryEngine(CustomQueryEngine):
# 公共屬性
qa_prompt: PromptTemplate
multi_modal_llm: OpenAIMultiModal
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None

# 使用PrivateAttr定義的私有屬性
_bm25: BM25Okapi = PrivateAttr()
_llm: OpenAI = PrivateAttr()
_text_node_with_context: List[TextNode] = PrivateAttr()
_vector_index: VectorStoreIndex = PrivateAttr()

def __init__(
self,
qa_prompt: PromptTemplate,
bm25: BM25Okapi,
multi_modal_llm: OpenAIMultiModal,
vector_index: VectorStoreIndex,
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None,
llm: OpenAI = None,
text_node_with_context: List[TextNode] = None,
):
super().__init__(
qa_prompt=qa_prompt,
retriever=None,
multi_modal_llm=multi_modal_llm,
node_postprocessors=node_postprocessors
)
self._bm25 = bm25
self._llm = llm
self._text_node_with_context = text_node_with_context
self._vector_index = vector_index

def custom_query(self, query_str: str):
# 準備查詢bundle
query_bundle = QueryBundle(query_str)

bm25_nodes = []
if best_match_25 == 1:  #如果選擇使用BM25搜索方法
# 使用BM25方法檢索節點
query_tokens = query_str.split()
bm25_scores = self._bm25.get_scores(query_tokens)
top_n_bm25 = 5  #調整要檢索的頂節點的數目
# 取得頂部BM25分數對應的索引值
top_indices_bm25 = bm25_scores.argsort()[-top_n_bm25:][::-1]
bm25_nodes = [self._text_node_with_context[i] for i in top_indices_bm25]
logging.info(f"BM25 nodes retrieved: {len(bm25_nodes)}")
else:
logging.info("BM25 not selected.")

#從向量存儲中使用基于向量的檢索技術進行節點檢索
vector_retriever = self._vector_index.as_query_engine().retriever
vector_nodes_with_scores = vector_retriever.retrieve(query_bundle)
# 指定你想要的頂部向量的數量
top_n_vectors = 5  # 根據需要調整這個值
# 僅取得頂部的'n'個節點
top_vector_nodes_with_scores = vector_nodes_with_scores[:top_n_vectors]
vector_nodes = [node.node for node in top_vector_nodes_with_scores]
logging.info(f"Vector nodes retrieved: {len(vector_nodes)}")

# 把節點組合起來，并刪除重復的節點
all_nodes = vector_nodes + bm25_nodes
unique_nodes_dict = {node.node_id: node for node in all_nodes}
unique_nodes = list(unique_nodes_dict.values())
logging.info(f"Unique nodes after deduplication: {len(unique_nodes)}")

nodes = unique_nodes

if re_ranking == 1:  #如果選擇使用重排序算法
# 使用Cohere公司的重排序算法對組合后的結果進行重排序
documents = [node.get_content() for node in nodes]
max_retries = 3
for attempt in range(max_retries):
try:
reranked = cohere_client.rerank(
model="rerank-english-v2.0",
query=query_str,
documents=documents,
top_n=3  # top-3 個重排序節點
)
break
except CohereError as e:
if attempt < max_retries - 1:
logging.warning(f"Error occurred: {str(e)}. Waiting for 60 seconds before retry {attempt + 1}/{max_retries}")
time.sleep(60)  #重試前需要等待
else:
logging.error("Error occurred. Max retries reached. Proceeding without re-ranking.")
reranked = None
break

if reranked:
reranked_indices = [result.index for result in reranked.results]
nodes = [nodes[i] for i in reranked_indices]
else:
nodes = nodes[:3]  #回退到頂部的3個節點
logging.info(f"Nodes after re-ranking: {len(nodes)}")
else:
logging.info("Re-ranking not selected.")

# 針對上下文字符串限制并過濾節點內容
max_context_length = 16000  # 根據需要進行調整
current_length = 0
filtered_nodes = []

#分詞器初始化
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

for node in nodes:
content = node.get_content(metadata_mode=MetadataMode.LLM).strip()
node_length = len(tokenizer.encode(content))
logging.info(f"Node ID: {node.node_id}, Content Length (tokens): {node_length}")
if not content:
logging.warning(f"Node ID: {node.node_id} has empty content. Skipping.")
continue
if current_length + node_length <= max_context_length:
filtered_nodes.append(node)
current_length += node_length
else:
logging.info(f"Reached max context length with Node ID: {node.node_id}")
break
logging.info(f"Filtered nodes for context: {len(filtered_nodes)}")

#創建上下文字符串
ctx_str = "\n\n".join(
[n.get_content(metadata_mode=MetadataMode.LLM).strip() for n in filtered_nodes]
)

# 根據與圖像關聯的節點創建圖像節點
image_nodes = []
for n in filtered_nodes:
if "image_path" in n.metadata:
image_nodes.append(
NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
)
else:
logging.warning(f"Node ID: {n.node_id} lacks 'image_path' metadata.")
logging.info(f"Image nodes created: {len(image_nodes)}")

# 為LLM準備提示符
fmt_prompt = self.qa_prompt.format(context_str=ctx_str, query_str=query_str)

# 使用多模態LLM解釋圖像并生成響應
llm_response = self.multi_modal_llm.complete(
prompt=fmt_prompt,
image_documents=[image_node.node for image_node in image_nodes],
max_tokens=16000
)

logging.info(f"LLM response generated.")

#返回結果響應值
return Response(
response=str(llm_response),
source_nodes=filtered_nodes,
metadata={
"text_node_with_context": self._text_node_with_context,
"image_nodes": image_nodes,
},
)

#使用BM25方法、Cohere的Re-ranking算法和查詢擴展初始化查詢引擎
query_engine = QueryEngine(
qa_prompt=PROMPT,
bm25=bm25,
multi_modal_llm=MM_LLM,
vector_index=index,
node_postprocessors=[],
llm=llm,
text_node_with_context=text_node_with_context
)
print("All done")

使用OpenAI公司提供的模型，特別是gpt-4o-mini的一個優點是上下文分配和查詢推理運行的成本要低得多，上下文分配時間也要短得多。雖然OpenAI公司和Anthropic公司的基本層確實很快達到API調用的最大速率限制，但Anthropc公司的基本層中的重試時間各不相同，可能太長。使用claude-3–5-connect-20240620對本文檔的前20頁進行上下文分配過程，使用提示緩存大約需要170秒，成本為20美分（輸入+輸出詞元）。然而，與Claude 3.5 Sonnet相比，gpt-4o-mini的輸入詞元大約便宜20倍，輸出詞元大約便宜25倍。OpenAI公司聲稱為重復內容實現了提示緩存，這對所有API調用都自動起作用。

相比之下，通過gpt-4o-mini向整個文檔（60頁）中的節點分配上下文大約在193秒內完成，沒有任何重試請求。

實現QueryEngine類后，我們可以按如下方式運行查詢推理：

original_query = """What are the top countries to whose citizens the Finnish Immigration Service issued the highest number of first residence permits in 2023?
Which of these countries received the highest number of first residence permits?"""
response = query_engine.query(original_query)
display(Markdown(str(response)))

這是對此查詢的markdown響應。

多模態RAG應用開發實戰演練-AI.x社區

對查詢的響應（圖片由作者提供）

查詢響應中引用的頁面如下：

多模態RAG應用開發實戰演練-AI.x社區

上述查詢中引用的一頁（第9頁）。提取的信息顯示在紅色矩形中（來源：移民關鍵數據）

現在，讓我們比較一下基于gpt-4o-mini模型的RAG（LlamaParse高級模式+上下文檢索+BM25+重排序）和基于Claude模型的RAG。我還實現了一個簡單的基礎級別的RAG，可以在GitHub的筆記本中找到。以下是要比較的三個RAG。

1. LlamaIndex中的簡單RAG使用SentenceSplitter將文檔分割成塊（chunk_size=800，chunk_overlap=400），創建向量索引和向量檢索。

2. CMRAG（claude-3–5-connect-20240620，voya-3）——LlamaParse高級模式+上下文檢索。

3. CMRAG（gpt-4o-mini，text-embedding-3-small）——LlamaParse高級模式+上下文檢索+BM25+重排序。

為了簡單起見，我們將這些RAG分別稱為RAG0、RAG1和RAG2。以下是報告中的三頁，我向每個RAG提出了三個問題（每頁一個問題）。紅色矩形突出顯示的區域顯示了基本事實或正確答案的來源。

多模態RAG應用開發實戰演練-AI.x社區

文件第4頁（來源：移民關鍵數據）

多模態RAG應用開發實戰演練-AI.x社區

文件第12頁（來源：移民關鍵數據）

多模態RAG應用開發實戰演練-AI.x社區

文件第20頁（來源：移民關鍵數據）

以下是對每個問題的三個RAG的回答。

多模態RAG應用開發實戰演練-AI.x社區

基本RAG、基于Claude模型的CMRAG和基于gpt-4o-mini模型的CMRAG的比較（圖片由作者提供）

可以看出，RAG2的表現非常好。對于第一個問題，RAG0提供了錯誤的答案，因為該問題是從圖像中提出的。RAG1和RAG2都提供了這個問題的正確答案。對于另外兩個問題，RAG0無法提供任何答案。然而，RAG1和RAG2都為這些問題提供了正確的答案。

總結

總體而言，由于集成了BM25方法、重排序和更好的提示，RAG2的性能在許多情況下與RAG1相當，甚至更好。它為上下文、多模態RAG提供了一種經濟高效的解決方案。該管道方案中可能的集成技術包括假設的文檔嵌入（簡稱“HyDE”）或查詢擴展等。同樣，也可以探索開源嵌入模型（如all-MiniLM-L6-v2模型）和/或輕量級的LLM（如gemma2或phi3-small），使其更具成本效益。

有關本文示例中完整的源代碼參考，請查看我的github代碼倉庫：??https://github.com/umairalipathan1980/Multimodal-contextual-RAG.git?source=post_page-----d1965b8ab00c--------------------------------??

譯者介紹

朱先忠，51CTO社區編輯，51CTO專家博客、講師，濰坊一所高校計算機教師，自由編程界老兵一枚。

原文標題：??Integrating Multimodal Data into a Large Language Model??，作者：Umair Ali Khan

?著作權歸作者所有，如需轉載，請注明出處，否則將追究法律責任

標簽

大型語言模型

LLM

多模態RAG應用

已于2024-10-29 11:50:07修改

贊

回復

舉報

回復

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

多模態RAG應用開發實戰演練原創

引言

上下文檢索RAG

具有上下文檢索的多模態RAG

多模態解析

上下文檢索

用BM25增強上下文檢索并重新排序

在查詢引擎中集成整個管道流程

總結

譯者介紹

目錄

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

多模態RAG應用開發實戰演練 原創

引言

上下文檢索RAG

具有上下文檢索的多模態RAG

多模態解析

上下文檢索

用BM25增強上下文檢索并重新排序

在查詢引擎中集成整個管道流程

總結

譯者介紹

目錄

多模態RAG應用開發實戰演練原創