多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練原創(chuàng)

發(fā)布于 2024-10-29 11:42

瀏覽

0收藏

本文將介紹如何基于高級解析、語義和關(guān)鍵字搜索以及重排序技術(shù)開發(fā)支持上下文檢索的多模態(tài)RAG應(yīng)用系統(tǒng)。

引言

當(dāng)下，所有大型語言模型（LLM）都存在一個知識截止日期的問題，即它們無法回答針對其知識庫中不存在的特定數(shù)據(jù)的查詢。例如，LLM無法回答有關(guān)公司去年會議紀要數(shù)據(jù)的查詢。另一方面，LLM還容易產(chǎn)生幻覺，并提供看似合理的錯誤答案。

為了克服這個問題，檢索增強生成（RAG）解決方案越來越受歡迎。RAG的主要思想是將外部文檔整合到大型語言模型中，并指導(dǎo)其行為僅從外部知識庫中回答問題。具體地說，這是通過將文檔分塊為更小的塊，計算每個塊的嵌入（數(shù)值表示），然后將嵌入作為索引存儲在專門的向量數(shù)據(jù)庫中來實現(xiàn)的。

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

RAG工作流程示意圖——查詢被轉(zhuǎn)換為嵌入，通過檢索模型與向量數(shù)據(jù)庫匹配，并與檢索到的數(shù)據(jù)相結(jié)合，最終通過大型語言模型產(chǎn)生響應(yīng)。

上下文檢索RAG

將用戶的查詢與向量數(shù)據(jù)庫中的小塊進行匹配的過程通常效果良好；然而，它還存在以下問題：

一個問題的答案可能需要多個彼此相距甚遠的塊。由于上下文丟失，無法找到所有相關(guān)的塊。例如，考慮一個法律文件的問題：“阿爾法和貝塔公司之間終止合伙關(guān)系的條件是什么？”文件中的一個部分可能是“協(xié)議可能會在特定條件下終止”。然而，由于缺乏任何上下文信息（沒有公司名稱），在檢索過程中無法選擇此塊。
對于某些問題，傳統(tǒng)的最佳匹配搜索比語義搜索更有效，尤其是對于精確匹配而言。例如，在電子商務(wù)文檔中，通過語義搜索方法對查詢“什么是產(chǎn)品ID ZX-450？”的答案可能會帶來有關(guān)多個產(chǎn)品的信息，而缺少確切的“ZX-450”產(chǎn)品。
從向量數(shù)據(jù)庫檢索到的信息被轉(zhuǎn)發(fā)到LLM，LLM根據(jù)查詢生成最終答案。在此過程中，LLM必須確定最合適的塊來生成最終答案。檢索到的塊太多可能會導(dǎo)致響應(yīng)中出現(xiàn)不相關(guān)的信息。因此，LLM必須有一個排序機制。

為了應(yīng)對這些問題，Anthropic公司最近引入了??一種向每個塊添加上下文的方法???；與原始RAG相比，該方法的性能有了顯著提高。在將文檔拆分為塊后，該方法首先將塊與整個文檔作為上下文一起發(fā)送到LLM，為每個塊分配一個簡短的上下文。隨后，上下文附加的塊被保存到向量數(shù)據(jù)庫中。它們進一步使用??bm25檢索器??將上下文分塊與最佳匹配相結(jié)合，該檢索器使用bm25方法搜索文檔，并使用一個重新排序模型，該模型根據(jù)相關(guān)性為每個檢索到的塊分配評分。

具有上下文檢索的多模態(tài)RAG

盡管性能有了顯著提高，但Anthropic公司僅證明了這些方法對文本類型數(shù)據(jù)的適用性。但當(dāng)今世界中，許多文檔中豐富的信息的來源包括圖像（圖形、圖形）和復(fù)雜的表格，等等。如果我們只解析文檔中的文本，我們將無法深入了解文檔中的其他模式。因此，包含圖像和復(fù)雜表格的文檔需要高效的解析方法，這不僅需要從文檔中正確提取它們，還需要理解它們。

使用Anthropic公司的最新模型（claude-3-5-connect-20240620）為文檔中的每個塊分配上下文在大型文檔的情況下可能會涉及高成本，因為它涉及將整個文檔與每個塊一起發(fā)送。盡管??Claude模型的提示緩存技術(shù)??可以通過在API調(diào)用之間緩存頻繁使用的上下文來顯著降低這一成本，但其成本仍遠高于OpenAI公司的成本高效模型，如gpt-4o-mini。

本文旨在探討針對上述Anthropic公司方法的進一步擴展，如下所示：

使用??LlamaParse??將所有內(nèi)容（從文本到表格再到圖像）提取到結(jié)構(gòu)良好的markdown格式的文檔中。
通過節(jié)點解析器將文檔解析為節(jié)點，而不是使用文本拆分器將文檔拆分為塊。這不僅涉及拆分文本，還涉及理解文檔的結(jié)構(gòu)、語義和元數(shù)據(jù)等任務(wù)。
OpenAI公司極具成本效益的大型語言模型gpt-4o-mini和嵌入模型text-embedding-3-small用于為每個節(jié)點分配上下文、生成最終響應(yīng)和計算節(jié)點的嵌入。

在了解了Anthropic公司關(guān)于上下文檢索的??博客文章???之后，我在??GitHub鏈接???上找到了OpenAI公司的部分實現(xiàn)。然而，它使用傳統(tǒng)的分塊和LlamaParse方法，沒有最近推出的??高級模式??。我發(fā)現(xiàn)Llamaparse的高級模式在提取文檔中的不同結(jié)構(gòu)方面非常有效。

Anthropic公司的上下文檢索實現(xiàn)也可以在GitHub上找到，它使用了LlamaIdex抽象；然而，它沒有實現(xiàn)多模態(tài)解析。在撰寫本文時，LlamaIdex提供了一個更新的??實現(xiàn)???，它使用了多模態(tài)解析和上下文檢索。該實現(xiàn)使用了Anthropic公司的LLM（claude-3–5-connect-2024062）和Voyage公司的嵌入模型（??Voyage-3??）。然而，它們并沒有像Anthropic公司的博客文章中提到的那樣探索BM25（Best Matching 25）排序算法和重排序（Reranking）技術(shù)。

本文討論的上下文檢索實現(xiàn)是一種低成本、多模態(tài)的RAG解決方案，通過BM25搜索和重新排序提高了檢索性能。還將這種基于上下文檢索的多模態(tài)RAG（CMRAG）的性能與基本RAG和LlamaIdex的上下文檢索實現(xiàn)進行了比較。

下面4個鏈接中重新使用了這其中的一些功能，并進行了必要的修改。

??1.https://colab.research.google.com/drive/1PcuVqUQjacMt18p8LwODnjbsXOFMurwa?usp=sharing#scrollTo=s-bxSMSa-qJe??

2.??https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/contextual_retrieval.ipynb??

3.??https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_contextual_retrieval_rag.ipynb??

4.??https://github.com/lesteroliver911/contextual-doc-retrieval-opneai-reranker?tab=readme-ov-file??

此實現(xiàn)的源代碼可在??GitHub??上獲得。

本文中用于實現(xiàn)基于上下文檢索的多模態(tài)RAG（以下簡稱“CMRAG”）的總體方法示意圖如下所示：

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

解析后的節(jié)點在保存到向量數(shù)據(jù)庫之前會被分配上下文。上下文檢索涉及結(jié)合嵌入（語義搜索）和TF-IDF向量（最佳匹配搜索），然后通過重新排序器模型進行重新排序，最后由LLM生成響應(yīng)。

接下來，讓我們深入研究一下CMRAG的分步實現(xiàn)。

多模態(tài)解析

首先，需要安裝以下依賴庫才能運行本文中討論的代碼。

!pip install llama-index ipython cohere rank-bm25 pydantic nest-asyncio python-dotenv openai llama-parse

GitHub筆記本文件中也提到了所有需要導(dǎo)入才能運行整個代碼的依賴庫。在這篇文章中，我使用了??芬蘭移民關(guān)鍵數(shù)據(jù)??（根據(jù)CC By 4.0許可，允許重復(fù)使用），其中包含幾個圖表、圖像和文本數(shù)據(jù)。

LlamaParse使用商業(yè)性質(zhì)的多模態(tài)模型（如gpt-4o）提供??多模態(tài)解析??來處理文檔提取。

parser = LlamaParse(
use_vendor_multimodal_model=True
vendor_multimodal_model_name="openai-gpt-4o"
vendor_multimodal_api_key=sk-proj-xxxxxx
)

在這種模式下，會對文檔的每一頁進行截圖，然后將截圖發(fā)送到多模態(tài)模型，并附上提取標記的指令。每頁的標記結(jié)果被合并到最終輸出中。

最近的??LlamaParse高級模式???提供了先進的多模態(tài)文檔解析支持，能夠?qū)⑽谋尽⒈砀窈蛨D像提取到結(jié)構(gòu)良好的標記中，同時顯著減少了缺失的內(nèi)容和幻覺。它可以通過在??Llama云平臺??創(chuàng)建一個免費賬號并獲得API密鑰來使用。免費計劃提供每天解析1000個頁面。

LlamaParse高級模式的使用方式如下：

from llama_parse import LlamaParse
import os

# 此函數(shù)負責(zé)從指定目錄下讀取所有文件
def read_docs(data_dir) -> List[str]:
files = []
for f in os.listdir(data_dir):
fname = os.path.join(data_dir, f)
if os.path.isfile(fname):
files.append(fname)
return files

parser = LlamaParse(
result_type="markdown",
premium_mode=True,
api_key=os.getenv("LLAMA_CLOUD_API_KEY")
)

files = read_docs(data_dir = DATA_DIR)

在上述代碼中，我們首先從指定目錄讀取文檔，使用解析器的get_json_result()方法解析文檔，并使用解析器的get_images()方法獲取圖像字典。隨后，提取節(jié)點并將其發(fā)送到LLM，以使用retrieve_nodes()方法根據(jù)整個文檔分配上下文。解析這份文檔（60頁），包括獲取圖像詞典等內(nèi)容，共計耗時5分34秒（一次性過程）。

print("Parsing...")
json_results = parser.get_json_result(files)
print("Getting image dictionaries...")
images = parser.get_images(json_results, download_path=image_dir)
print("Retrieving nodes...")

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

報告第四頁（來源：芬蘭移民關(guān)鍵數(shù)據(jù)）

json_results[0]["pages"][3]

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

報告中的第四頁由JSON結(jié)果的第一個節(jié)點表示（按作者排列的圖像）

上下文檢索

通過retrieve_nodes()函數(shù)從解析的josn_results中提取單個節(jié)點和相關(guān)圖像（屏幕截圖）。每個節(jié)點與所有節(jié)點（以下代碼中的doc變量）一起被發(fā)送到_assign_context()函數(shù)。_assign_context()函數(shù)使用提示模板??context_prompt_TMPL??（來自鏈接，并經(jīng)過修改后采用）為每個節(jié)點添加簡潔的上下文。通過這種方式，我們將元數(shù)據(jù)、標記文本、上下文和原始文本集成到節(jié)點中。

以下代碼顯示了retrieve_nodes()函數(shù)的實現(xiàn)。兩個輔助函數(shù)_get_sorted_image_files()和get_img_page_number()分別按頁面和圖像的頁碼獲取排序后的圖像文件。總體目標不是像簡單的RAG那樣僅依賴原始文本來生成最終答案，而是考慮元數(shù)據(jù)、標記文本、上下文和原始文本，以及檢索到的節(jié)點的整個圖像（屏幕截圖）（節(jié)點元數(shù)據(jù)中的圖像鏈接）來生成最終響應(yīng)。

# 針對文件名使用正則表達式獲取圖像所在的頁碼
def get_img_page_number(file_name):
match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
if match:
return int(match.group(1))
return 0

#獲取按頁排序的圖像文件
def _get_sorted_image_files(image_dir):
raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
sorted_files = sorted(raw_files, key=get_img_page_number)
return sorted_files

#針對上下文塊的上下文提示模板
CONTEXT_PROMPT_TMPL = """
You are an AI assistant specializing in document analysis. Your task is to provide brief, relevant context for a chunk of text from the given document.
Here is the document:
<document>
{document}
</document>

Here is the chunk we want to situate within the whole document:
<chunk>
{chunk}
</chunk>

Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
1. Identify the main topic or concept discussed in the chunk.
2. Mention any relevant information or comparisons from the broader document context.
3. If applicable, note how this information relates to the overall theme or purpose of the document.
4. Include any key figures, dates, or percentages that provide important context.
5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.

Please give a short succinct context to situate this chunk within the overall document to improve search retrieval of the chunk. 
Answer only with the succinct context and nothing else.

Context:
"""

CONTEXT_PROMPT = PromptTemplate(CONTEXT_PROMPT_TMPL)

#下面的函數(shù)針對每一個塊生成上下文
def _assign_context(document: str, chunk: str, llm) -> str:
prompt = CONTEXT_PROMPT.format(document=document, chunk=chunk)
response = llm.complete(prompt)
context = response.text.strip()
return context

#下面函數(shù)使用上下文生成文本節(jié)點
def retrieve_nodes(json_results, image_dir, llm) -> List[TextNode]:
nodes = []
for result in json_results:
json_dicts = result["pages"]
document_name = result["file_path"].split('/')[-1]
docs = [doc["md"] for doc in json_dicts]  # 提取文字信息
image_files = _get_sorted_image_files(image_dir)  #提取圖像信息
# 連接所有文檔以創(chuàng)建完整的文件文字內(nèi)容
document_text = "\n\n".join(docs)
for idx, doc in enumerate(docs):
# 針對每個塊（頁）生成上下文
context = _assign_context(document_text, doc, llm)
# 把文檔內(nèi)容與初始塊結(jié)合到一起
contextualized_content = f"{context}\n\n{doc}"
# 使用上下文化后的內(nèi)容生成文本節(jié)點
chunk_metadata = {"page_num": idx + 1}
chunk_metadata["image_path"] = str(image_files[idx])
chunk_metadata["parsed_text_markdown"] = docs[idx]

node = TextNode(
text=contextualized_content,
metadata=chunk_metadata,
)
nodes.append(node)
return nodes
#取得文本節(jié)點
text_node_with_context = retrieve_nodes(json_results, image_dir, llm)First page of the report (image by author)First page of the report (image by author)

下面給出的是與報告第一頁對應(yīng)的節(jié)點的描述。

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

添加了上下文和元數(shù)據(jù)的節(jié)點（圖片由作者提供）

用BM25增強上下文檢索并重新排序

所有具有元數(shù)據(jù)、原始文本、標記文本和上下文信息的節(jié)點都被索引到向量數(shù)據(jù)庫中。節(jié)點的BM25索引被創(chuàng)建并保存在pickle文件中，用于查詢推理。處理后的節(jié)點也會被保存，以供以后使用（text_node_with_context.pkl）。

# 創(chuàng)建向量存儲牽引
index = VectorStoreIndex(text_node_with_context, embed_model=embed_model)
index.storage_context.persist(persist_dir=output_dir)
# 構(gòu)建BM25索引
documents = [node.text for node in text_node_with_context]
tokenized_documents = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_documents)
# 保存bm25和text_node_with_context
with open(os.path.join(output_dir, 'tokenized_documents.pkl'), 'wb') as f:
pickle.dump(tokenized_documents, f)
with open(os.path.join(output_dir, 'text_node_with_context.pkl'), 'wb') as f:
pickle.dump(text_node_with_context, f)

現(xiàn)在，我們可以初始化一個查詢引擎，使用以下管道進行查詢。但在此之前，設(shè)置以下提示以指導(dǎo)LLM生成最終響應(yīng)的行為。初始化多模態(tài)LLM（gpt-4o-mini）以生成最終響應(yīng)。此提示可根據(jù)需要進行調(diào)整。

# 定義QA 提示模板
RAG_PROMPT = """\
Below we give parsed text from documents in two different formats, as well as the image.

---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query. Generate the answer by analyzing parsed markdown, raw text and the related
image. Especially, carefully analyze the images to look for the required information.
Format the answer in proper format as deems suitable (bulleted lists, sections/sub-sections, tables, etc.)
Give the page's number and the document name where you find the response based on the Context.

Query: {query_str}
Answer: """

PROMPT = PromptTemplate(RAG_PROMPT)

#初始化多模態(tài)LLM
MM_LLM = OpenAIMultiModal(model="gpt-4o-mini", temperature=0.0, max_tokens=16000)

在查詢引擎中集成整個管道流程

本節(jié)中要介紹的QueryEngine類實現(xiàn)了上述完整的工作流程。BM25搜索中的節(jié)點數(shù)量（top_n_BM25）和重新排序器重新排序的結(jié)果數(shù)量（top_name）可以根據(jù)需要進行調(diào)整。通過切換GitHub代碼中的best_match_25和re_ranking變量，可以選擇或取消選擇BM25搜索和重排序。

下面給出的是QueryEngine類實現(xiàn)的整體工作流程：

1. 查找查詢嵌入。

2. 使用基于向量的檢索從向量數(shù)據(jù)庫中檢索節(jié)點。

3. 使用BM25搜索檢索節(jié)點（如果選擇使用該方法的話）。

4. 結(jié)合BM25和基于向量的檢索中的節(jié)點。查找節(jié)點的唯一數(shù)量（刪除重復(fù)的節(jié)點）。

5. 應(yīng)用重排序?qū)M合結(jié)果進行重排序（如果選中該方法的話）。在這里，我們使用Cohere公司的rerank-english-v2.0重新排序模型。您可以在Cohere公司的??網(wǎng)站??上創(chuàng)建一個賬號，以獲得試用版API密鑰。

6. 從與節(jié)點關(guān)聯(lián)的圖像創(chuàng)建圖像節(jié)點。

7. 根據(jù)解析的markdown文本創(chuàng)建上下文字符串。

8. 將節(jié)點圖像發(fā)送到多模態(tài)LLM進行解釋。

9. 通過將文本節(jié)點、圖像節(jié)點描述和元數(shù)據(jù)發(fā)送到LLM來生成最終響應(yīng)。

#定義類QueryEngine，把所有方法集成到一起
class QueryEngine(CustomQueryEngine):
# 公共屬性
qa_prompt: PromptTemplate
multi_modal_llm: OpenAIMultiModal
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None

# 使用PrivateAttr定義的私有屬性
_bm25: BM25Okapi = PrivateAttr()
_llm: OpenAI = PrivateAttr()
_text_node_with_context: List[TextNode] = PrivateAttr()
_vector_index: VectorStoreIndex = PrivateAttr()

def __init__(
self,
qa_prompt: PromptTemplate,
bm25: BM25Okapi,
multi_modal_llm: OpenAIMultiModal,
vector_index: VectorStoreIndex,
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None,
llm: OpenAI = None,
text_node_with_context: List[TextNode] = None,
):
super().__init__(
qa_prompt=qa_prompt,
retriever=None,
multi_modal_llm=multi_modal_llm,
node_postprocessors=node_postprocessors
)
self._bm25 = bm25
self._llm = llm
self._text_node_with_context = text_node_with_context
self._vector_index = vector_index

def custom_query(self, query_str: str):
# 準備查詢bundle
query_bundle = QueryBundle(query_str)

bm25_nodes = []
if best_match_25 == 1:  #如果選擇使用BM25搜索方法
# 使用BM25方法檢索節(jié)點
query_tokens = query_str.split()
bm25_scores = self._bm25.get_scores(query_tokens)
top_n_bm25 = 5  #調(diào)整要檢索的頂節(jié)點的數(shù)目
# 取得頂部BM25分數(shù)對應(yīng)的索引值
top_indices_bm25 = bm25_scores.argsort()[-top_n_bm25:][::-1]
bm25_nodes = [self._text_node_with_context[i] for i in top_indices_bm25]
logging.info(f"BM25 nodes retrieved: {len(bm25_nodes)}")
else:
logging.info("BM25 not selected.")

#從向量存儲中使用基于向量的檢索技術(shù)進行節(jié)點檢索
vector_retriever = self._vector_index.as_query_engine().retriever
vector_nodes_with_scores = vector_retriever.retrieve(query_bundle)
# 指定你想要的頂部向量的數(shù)量
top_n_vectors = 5  # 根據(jù)需要調(diào)整這個值
# 僅取得頂部的'n'個節(jié)點
top_vector_nodes_with_scores = vector_nodes_with_scores[:top_n_vectors]
vector_nodes = [node.node for node in top_vector_nodes_with_scores]
logging.info(f"Vector nodes retrieved: {len(vector_nodes)}")

# 把節(jié)點組合起來，并刪除重復(fù)的節(jié)點
all_nodes = vector_nodes + bm25_nodes
unique_nodes_dict = {node.node_id: node for node in all_nodes}
unique_nodes = list(unique_nodes_dict.values())
logging.info(f"Unique nodes after deduplication: {len(unique_nodes)}")

nodes = unique_nodes

if re_ranking == 1:  #如果選擇使用重排序算法
# 使用Cohere公司的重排序算法對組合后的結(jié)果進行重排序
documents = [node.get_content() for node in nodes]
max_retries = 3
for attempt in range(max_retries):
try:
reranked = cohere_client.rerank(
model="rerank-english-v2.0",
query=query_str,
documents=documents,
top_n=3  # top-3 個重排序節(jié)點
)
break
except CohereError as e:
if attempt < max_retries - 1:
logging.warning(f"Error occurred: {str(e)}. Waiting for 60 seconds before retry {attempt + 1}/{max_retries}")
time.sleep(60)  #重試前需要等待
else:
logging.error("Error occurred. Max retries reached. Proceeding without re-ranking.")
reranked = None
break

if reranked:
reranked_indices = [result.index for result in reranked.results]
nodes = [nodes[i] for i in reranked_indices]
else:
nodes = nodes[:3]  #回退到頂部的3個節(jié)點
logging.info(f"Nodes after re-ranking: {len(nodes)}")
else:
logging.info("Re-ranking not selected.")

# 針對上下文字符串限制并過濾節(jié)點內(nèi)容
max_context_length = 16000  # 根據(jù)需要進行調(diào)整
current_length = 0
filtered_nodes = []

#分詞器初始化
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

for node in nodes:
content = node.get_content(metadata_mode=MetadataMode.LLM).strip()
node_length = len(tokenizer.encode(content))
logging.info(f"Node ID: {node.node_id}, Content Length (tokens): {node_length}")
if not content:
logging.warning(f"Node ID: {node.node_id} has empty content. Skipping.")
continue
if current_length + node_length <= max_context_length:
filtered_nodes.append(node)
current_length += node_length
else:
logging.info(f"Reached max context length with Node ID: {node.node_id}")
break
logging.info(f"Filtered nodes for context: {len(filtered_nodes)}")

#創(chuàng)建上下文字符串
ctx_str = "\n\n".join(
[n.get_content(metadata_mode=MetadataMode.LLM).strip() for n in filtered_nodes]
)

# 根據(jù)與圖像關(guān)聯(lián)的節(jié)點創(chuàng)建圖像節(jié)點
image_nodes = []
for n in filtered_nodes:
if "image_path" in n.metadata:
image_nodes.append(
NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
)
else:
logging.warning(f"Node ID: {n.node_id} lacks 'image_path' metadata.")
logging.info(f"Image nodes created: {len(image_nodes)}")

# 為LLM準備提示符
fmt_prompt = self.qa_prompt.format(context_str=ctx_str, query_str=query_str)

# 使用多模態(tài)LLM解釋圖像并生成響應(yīng)
llm_response = self.multi_modal_llm.complete(
prompt=fmt_prompt,
image_documents=[image_node.node for image_node in image_nodes],
max_tokens=16000
)

logging.info(f"LLM response generated.")

#返回結(jié)果響應(yīng)值
return Response(
response=str(llm_response),
source_nodes=filtered_nodes,
metadata={
"text_node_with_context": self._text_node_with_context,
"image_nodes": image_nodes,
},
)

#使用BM25方法、Cohere的Re-ranking算法和查詢擴展初始化查詢引擎
query_engine = QueryEngine(
qa_prompt=PROMPT,
bm25=bm25,
multi_modal_llm=MM_LLM,
vector_index=index,
node_postprocessors=[],
llm=llm,
text_node_with_context=text_node_with_context
)
print("All done")

使用OpenAI公司提供的模型，特別是gpt-4o-mini的一個優(yōu)點是上下文分配和查詢推理運行的成本要低得多，上下文分配時間也要短得多。雖然OpenAI公司和Anthropic公司的基本層確實很快達到API調(diào)用的最大速率限制，但Anthropc公司的基本層中的重試時間各不相同，可能太長。使用claude-3–5-connect-20240620對本文檔的前20頁進行上下文分配過程，使用提示緩存大約需要170秒，成本為20美分（輸入+輸出詞元）。然而，與Claude 3.5 Sonnet相比，gpt-4o-mini的輸入詞元大約便宜20倍，輸出詞元大約便宜25倍。OpenAI公司聲稱為重復(fù)內(nèi)容實現(xiàn)了提示緩存，這對所有API調(diào)用都自動起作用。

相比之下，通過gpt-4o-mini向整個文檔（60頁）中的節(jié)點分配上下文大約在193秒內(nèi)完成，沒有任何重試請求。

實現(xiàn)QueryEngine類后，我們可以按如下方式運行查詢推理：

original_query = """What are the top countries to whose citizens the Finnish Immigration Service issued the highest number of first residence permits in 2023?
Which of these countries received the highest number of first residence permits?"""
response = query_engine.query(original_query)
display(Markdown(str(response)))

這是對此查詢的markdown響應(yīng)。

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

對查詢的響應(yīng)（圖片由作者提供）

查詢響應(yīng)中引用的頁面如下：

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

上述查詢中引用的一頁（第9頁）。提取的信息顯示在紅色矩形中（來源：移民關(guān)鍵數(shù)據(jù)）

現(xiàn)在，讓我們比較一下基于gpt-4o-mini模型的RAG（LlamaParse高級模式+上下文檢索+BM25+重排序）和基于Claude模型的RAG。我還實現(xiàn)了一個簡單的基礎(chǔ)級別的RAG，可以在GitHub的筆記本中找到。以下是要比較的三個RAG。

1. LlamaIndex中的簡單RAG使用SentenceSplitter將文檔分割成塊（chunk_size=800，chunk_overlap=400），創(chuàng)建向量索引和向量檢索。

2. CMRAG（claude-3–5-connect-20240620，voya-3）——LlamaParse高級模式+上下文檢索。

3. CMRAG（gpt-4o-mini，text-embedding-3-small）——LlamaParse高級模式+上下文檢索+BM25+重排序。

為了簡單起見，我們將這些RAG分別稱為RAG0、RAG1和RAG2。以下是報告中的三頁，我向每個RAG提出了三個問題（每頁一個問題）。紅色矩形突出顯示的區(qū)域顯示了基本事實或正確答案的來源。

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

文件第4頁（來源：移民關(guān)鍵數(shù)據(jù)）

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

文件第12頁（來源：移民關(guān)鍵數(shù)據(jù)）

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

文件第20頁（來源：移民關(guān)鍵數(shù)據(jù)）

以下是對每個問題的三個RAG的回答。

多模態(tài)RAG應(yīng)用開發(fā)實戰(zhàn)演練-AI.x社區(qū)

基本RAG、基于Claude模型的CMRAG和基于gpt-4o-mini模型的CMRAG的比較（圖片由作者提供）

可以看出，RAG2的表現(xiàn)非常好。對于第一個問題，RAG0提供了錯誤的答案，因為該問題是從圖像中提出的。RAG1和RAG2都提供了這個問題的正確答案。對于另外兩個問題，RAG0無法提供任何答案。然而，RAG1和RAG2都為這些問題提供了正確的答案。

總結(jié)

總體而言，由于集成了BM25方法、重排序和更好的提示，RAG2的性能在許多情況下與RAG1相當(dāng)，甚至更好。它為上下文、多模態(tài)RAG提供了一種經(jīng)濟高效的解決方案。該管道方案中可能的集成技術(shù)包括假設(shè)的文檔嵌入（簡稱“HyDE”）或查詢擴展等。同樣，也可以探索開源嵌入模型（如all-MiniLM-L6-v2模型）和/或輕量級的LLM（如gemma2或phi3-small），使其更具成本效益。

有關(guān)本文示例中完整的源代碼參考，請查看我的github代碼倉庫：??https://github.com/umairalipathan1980/Multimodal-contextual-RAG.git?source=post_page-----d1965b8ab00c--------------------------------??