LLMLingua：集成LlamaIndex，對提示進行壓縮，提供大語言模型的高效推理

作者：佚名 2023-11-27 15:06:24

大型語言模型(llm)的出現刺激了多個領域的創新。但是在思維鏈(CoT)提示和情境學習(ICL)等策略的驅動下，提示的復雜性不斷增加，這給計算帶來了挑戰。這些冗長的提示需要大量的資源來進行推理，因此需要高效的解決方案，本文將介紹LLMLingua與專有的LlamaIndex的進行集成執行高效推理。

LLMLingua是微軟的研究人員發布在EMNLP 2023的一篇論文，LongLLMLingua是一種通過快速壓縮增強llm在長上下文場景中感知關鍵信息的能力的方法。

LLMLingua與llamindex的協同工作

LLMLingua作為解決LLM應用程序中冗長提示的開創性解決方案而出現。該方法側重于壓縮冗長提示，同時保證語義完整性和提高推理速度。它結合了各種壓縮策略，提供了一種微妙的方法來平衡提示長度和計算效率。

以下是LLMLingua與LlamaIndex集成的優勢:

LLMLingua與LlamaIndex的集成標志著llm在快速優化方面邁出了重要的一步。LlamaIndex是一個包含為各種LLM應用程序量身定制的預優化提示的專門的存儲庫，通過這種集成LLMLingua可以訪問豐富的特定于領域的、經過微調的提示，從而增強其提示壓縮能力。

LLMLingua的提示壓縮技術和LlamaIndex的優化提示庫之間的協同作用提高了LLM應用程序的效率。利用LLAMA的專門提示，LLMLingua可以微調其壓縮策略，確保保留特定于領域的上下文，同時減少提示長度。這種協作極大地加快了推理速度，同時保留了關鍵領域的細微差別。

LLMLingua與LlamaIndex的集成擴展了其對大規模LLM應用程序的影響。通過利用LLAMA的專業提示，LLMLingua優化了其壓縮技術，減輕了處理冗長提示的計算負擔。這種集成不僅加速了推理，而且確保了關鍵領域特定信息的保留。

LLMLingua與LlamaIndex的工作流程

使用LlamaIndex實現LLMLingua涉及到一個結構化的過程，該過程利用專門的提示庫來實現高效的提示壓縮和增強的推理速度。

1. 框架集成

首先需要在LLMLingua和LlamaIndex之間建立連接。這包括訪問權限、API配置和建立連接，以便及時檢索。

2. 預先優化提示的檢索

LlamaIndex充當專門的存儲庫，包含為各種LLM應用程序量身定制的預優化提示。LLMLingua訪問這個存儲庫，檢索特定于域的提示，并利用它們進行提示壓縮。

3. 提示壓縮技術

LLMLingua使用它的提示壓縮方法來簡化檢索到的提示。這些技術專注于壓縮冗長的提示，同時確保語義一致性，從而在不影響上下文或相關性的情況下提高推理速度。

4. 微調壓縮策略

LLMLingua基于從LlamaIndex獲得的專門提示來微調其壓縮策略。這種細化過程確保保留特定于領域的細微差別，同時有效地減少提示長度。

5. 執行與推理

一旦使用LLMLingua的定制策略與LlamaIndex的預優化提示進行壓縮，壓縮后的提示就可以用于LLM推理任務。此階段涉及在LLM框架內執行壓縮提示，以實現高效的上下文感知推理。

6. 迭代改進和增強

代碼實現不斷地經歷迭代的細化。這個過程包括改進壓縮算法，優化從LlamaIndex中檢索提示，微調集成，確保壓縮后的提示和LLM推理的一致性和增強的性能。

7. 測試和驗證

如果需要還可以進行測試和驗證，這樣可以評估LLMLingua與LlamaIndex集成的效率和有效性。評估性能指標以確保壓縮提示保持語義完整性并在不影響準確性的情況下提高推理速度。

代碼實現

下面我們將開始深入研究LLMLingua與LlamaIndex的代碼實現

安裝程序包：

# Install dependency.
 !pip install llmlingua llama-index openai tiktoken -q 
 
 # Using the OAI
 import openai
 openai.api_key = "<insert_openai_key>"

獲取數據：

!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

加載模型：

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
 )
 
 # load documents
 documents = SimpleDirectoryReader(
    input_files=["paul_graham_essay.txt"]
 ).load_data()

向量存儲：

index = VectorStoreIndex.from_documents(documents)
 
 retriever = index.as_retriever(similarity_top_k=10)
 
 question = "Where did the author go for art school?"
 
 # Ground-truth Answer
 answer = "RISD"
 
 contexts = retriever.retrieve(question)
 
 contexts = retriever.retrieve(question)
 
 context_list = [n.get_content() for n in contexts]
 len(context_list)
 
 #Output 
 #10

原始提示和返回

# The response from original prompt
 from llama_index.llms import OpenAI
 
 llm = OpenAI(model="gpt-3.5-turbo-16k")
 prompt = "\n\n".join(context_list + [question])
 
 response = llm.complete(prompt)
 print(str(response))
 
 #Output
 The author went to the Rhode Island School of Design (RISD) for art school.

設置 LLMLingua

from llama_index.query_engine import RetrieverQueryEngine
 from llama_index.response_synthesizers import CompactAndRefine
 from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor
 
 node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort", # enable document reorder,
        "dynamic_context_compression_ratio": 0.3,
    },
 )

通過LLMLingua進行壓縮

retrieved_nodes = retriever.retrieve(question)
 synthesizer = CompactAndRefine()
 
 from llama_index.indices.query.schema import QueryBundle
 
 
 # postprocess (compress), synthesize
 new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
 )
 
 original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
 compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])
 
 original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
 compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

打印2個結果對比：

print(compressed_contexts)
 print()
 print("Original Tokens:", original_tokens)
 print("Compressed Tokens:", compressed_tokens)
 print("Comressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

打印的結果如下：

next Rtm's advice hadn' included anything that. I wanted to do something completely different, so I decided I'd paint. I wanted to how good I could get if I focused on it. the day after stopped on YC, I painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging.1]
 
 I wanted to back RISD, was now broke and RISD was very expensive so decided job for a year and return RISD the fall. I got one at Interleaf, which made software for creating documents. You like Microsoft Word? Exactly That was I low end software tends to high. Interleaf still had a few years to live yet. []
 
  the Accademia wasn't, and my money was running out, end year back to the
  lot the color class I tookD, but otherwise I was basically myself to do that for in993 I dropped I aroundidence bit then my friend Par did me a big A rent-partment building New York. Did I want it Itt more my place, and York be where the artists. wanted [For when you that ofs you big painting of this type hanging in the apartment of a hedge fund manager, you know he paid millions of dollars for it. That's not always why artists have a signature style, but it's usually why buyers pay a lot for such work. [6]
 
 Original Tokens: 10719
 Compressed Tokens: 308
 Comressed Ratio: 34.80x

驗證輸出：

response = synthesizer.synthesize(question, new_retrieved_nodes)
 print(str(response))
 
 #Output
 #The author went to RISD for art school.

總結

LLMLingua與LlamaIndex的集成證明了協作關系在優化大型語言模型(LLM)應用程序方面的變革潛力。這種協作徹底改變了即時壓縮方法和推理效率，為上下文感知、簡化的LLM應用程序鋪平了道路。

這種集成不僅加快了推理速度，而且確保了在壓縮提示中保持語義完整性。基于LlamaIndex特定領域提示的壓縮策略微調在提示長度減少和基本上下文保留之間取得了平衡，從而提高了LLM推理的準確性。

從本質上講，LLMLingua與LlamaIndex的集成超越了傳統的提示壓縮方法，為未來大型語言模型應用程序的優化、上下文準確和有效地針對不同領域進行定制奠定了基礎。這種協作集成預示著大型語言模型應用程序領域中效率和精細化的新時代的到來。

責任編輯：華軒來源： DeepHub IMBA

大型語言模型人工智能

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看