2025頂流RAG重排器盤點(diǎn):告別“信息噪音”,讓AI回答更精準(zhǔn)! 原創(chuàng)
當(dāng)咱們在RAG(Retrieval-Augmented Generation,檢索增強(qiáng)生成)的世界里摸爬滾打時,是不是經(jīng)常遇到這樣的困惑:明明已經(jīng)“海量”檢索了相關(guān)文檔,為啥LLM(大語言模型)給出的答案還是“差強(qiáng)人意”?這背后,其實(shí)藏著一個容易被忽視,但又至關(guān)重要的環(huán)節(jié)——初次檢索的“噪音”問題!
為什么初次檢索總是“差強(qiáng)人意”?

你是不是也覺得,RAG的第一步——根據(jù)用戶查詢找到相關(guān)文檔,聽起來挺簡單的?現(xiàn)在常用的方法,比如關(guān)鍵詞搜索或者向量相似度匹配,確實(shí)能很快拉出一堆文檔來。但問題也恰恰出在這里:它們“太會”找了,找回來一大堆,但真正有用的可能就那么幾篇,甚至還混雜了不少“無關(guān)緊要”的垃圾信息。
為啥會這樣呢?
- 嵌入模型不夠“懂你”:咱們用的那些嵌入(embedding)模型,雖然能理解語義,但在面對一些特別細(xì)致或者專業(yè)的問題時,往往“力不從心”,無法精準(zhǔn)把握住那些“微言大義”,導(dǎo)致檢索結(jié)果不夠精細(xì)。
- 短查詢和專業(yè)術(shù)語的“坑”:向量搜索雖然好,但遇到短查詢或者特別專業(yè)的術(shù)語時,它就容易“懵圈”。比如你搜個“心肌梗死最新治療方案”,它可能給你推一堆關(guān)于心臟病的普及文章,而不是最前沿的臨床研究。
- LLM的“記憶力”有限:大語言模型雖然強(qiáng)大,但它們的上下文窗口(context window)是有限的!你一股腦地把一大堆文檔,哪怕是“沾點(diǎn)邊”的文檔都喂給它,反而會讓它“消化不良”,注意力分散,最終影響答案的質(zhì)量。這就好比你給一個專家提供了一大堆未經(jīng)篩選的資料,專家反而可能被這些“噪音”干擾,無法快速抓住重點(diǎn)。
所以,你看,這些“噪音”檢索,就像是給LLM的“大腦”里塞了一堆雜亂無章的信息,不僅稀釋了它的專注度,還可能導(dǎo)致它“胡思亂想”,也就是我們常說的“幻覺”(hallucination)。我們需要一個“清道夫”,來幫我們把這些初步檢索回來的信息好好“洗洗牌”!
救星駕到:重排器(Rerankers)閃亮登場!

各位看官,是時候請出我們今天的“主角”了——重排器(Rerankers)!
重排器,顧名思義,就是對搜索結(jié)果進(jìn)行“二次排序”的工具。它就像一個精明的“信息偵探”,在初次檢索把一大堆文檔拉出來之后,重排器會再次出馬,運(yùn)用更高級的算法,深入分析這些文檔與用戶查詢之間的關(guān)聯(lián)度,然后把最最相關(guān)的那些文檔“提溜”到最前面。
在RAG的流程里,重排器扮演的角色,就是那個“質(zhì)量守門員”。 它仔細(xì)審視第一批檢索結(jié)果,然后根據(jù)文檔對用戶查詢的“匹配度”和“信息量”,進(jìn)行優(yōu)先級排序。我們的目標(biāo)很簡單粗暴:把最有價值的信息,狠狠地往上頂!
你可以把重排器想象成一個“專業(yè)校對員”,它對初始搜索的結(jié)果進(jìn)行二次核查,憑借對語言更深層次的理解,找出文檔和問題之間最完美的契合點(diǎn)。
重排器如何讓RAG“脫胎換骨”?

重排器的加入,可不僅僅是錦上添花,它能讓RAG的效果發(fā)生質(zhì)的飛躍!
- 精準(zhǔn)度飆升:重排器不僅僅是做關(guān)鍵詞匹配,它會深入分析用戶問題和每個文檔之間的“語義”關(guān)系。這種“深度理解”能幫助它識別出最有用的信息,確保給到LLM的上下文是高度精準(zhǔn)的。
- 答案更“對味兒”:當(dāng)LLM接收到的是一個更小、更精煉、質(zhì)量更高的文檔集合時,它自然能給出更精確、更直接的答案。這就好比,你給一個廚師提供了最上乘的食材,他自然能做出更美味的佳肴。重排器通過計算一個得分,顯示文檔與查詢的語義距離,從而實(shí)現(xiàn)更優(yōu)的最終排序。就算沒有完全匹配的關(guān)鍵詞,它也能找到相關(guān)的寶藏信息。
- 告別“胡說八道”(Hallucination):前面提到LLM“幻覺”的問題,很大一部分原因就是喂給它的信息不夠“純凈”。而經(jīng)過重排器篩選和驗(yàn)證的文檔,能給LLM提供一個更堅(jiān)實(shí)的基礎(chǔ),大大降低它“一本正經(jīng)地胡說八道”的概率,讓最終的輸出更值得信賴。
所以,標(biāo)準(zhǔn)的RAG流程是“檢索”然后“生成”。而一個增強(qiáng)版的RAG流程,會在中間加一個“重排”的步驟:
- 檢索(Retrieve):先拉出一批初步的候選文檔。
- 重排(Rerank):用重排模型對這些文檔根據(jù)查詢的相關(guān)性進(jìn)行重新排序。
- 生成(Generate):只把排在最前面、最相關(guān)的文檔喂給LLM,讓它生成答案。
這種“兩階段”的方法,讓初始檢索可以“廣撒網(wǎng)”(注重召回率),而重排器則負(fù)責(zé)從這張大網(wǎng)里“精挑細(xì)選”(注重準(zhǔn)確率)。這種分工合作,能顯著提升整個RAG流程的效率和效果,給LLM提供最佳的輸入。
2025年,哪些重排器模型值得關(guān)注?

既然重排器這么給力,那市面上都有哪些好用的重排模型呢?2025年,一些頂級的重排模型已經(jīng)嶄露頭角,各有所長:
重排器模型 | 類型 | 來源 | 優(yōu)勢 | 劣勢 | 最佳應(yīng)用場景 |
Cohere | Cross-encoder (API) | 商業(yè) | 高精度、多語言支持、易用性、速度快(Nimble版) | 費(fèi)用(API調(diào)用費(fèi))、閉源 | 通用RAG、企業(yè)級搜索、多語言應(yīng)用、追求易用性 |
bge-reranker | Cross-encoder | 開源 | 高精度、開源、可在中等配置硬件上運(yùn)行 | 需要自行部署 | 通用RAG、開源偏好者、預(yù)算有限、樂于自部署 |
Voyage | Cross-encoder (API) | 商業(yè) | 頂尖的關(guān)聯(lián)度/精度表現(xiàn) | 費(fèi)用(API調(diào)用費(fèi))、可能更高的延遲(頂級模型) | 極高精度需求(金融、法律)、關(guān)聯(lián)度關(guān)鍵型應(yīng)用 |
Jina | Cross-encoder / ColBERT 變體 | 混合 | 性能均衡、成本效益高、支持長文檔(Jina-ColBERT) | 可能達(dá)不到最高精度 | 通用RAG、長文檔處理、平衡成本與性能 |
FlashRank | 輕量級 Cross-encoder | 開源 | 速度極快、資源消耗低、易于集成 | 精度低于大型模型 | 速度關(guān)鍵型應(yīng)用、資源受限環(huán)境 |
ColBERT | Multi-vector (Late Interaction) | 開源 | 大規(guī)模高效檢索、對大型數(shù)據(jù)集高效 | 索引計算/存儲密集 | 極大型文檔集、追求規(guī)模化效率 |
MixedBread (mxbai-rerank-v2) | Cross-encoder | 開源 | SOTA級性能(宣稱)、推理速度快、多語言、長上下文、多功能 | 需要自行部署、相對較新 | 高性能RAG、多語言、長文檔/代碼/JSON處理、開源偏好者、LLM工具選擇 |
接下來,我們來具體看看其中幾個有代表性的模型:
1. Cohere Rerank
Cohere Rerank是Cohere公司推出的一款強(qiáng)大的重排模型,它基于先進(jìn)的神經(jīng)網(wǎng)絡(luò),很可能是Transformer架構(gòu)的交叉編碼器(Cross-encoder)。它的工作原理是同時處理查詢和文檔,從而精準(zhǔn)判斷它們的相關(guān)性。這是一個閉源的商業(yè)模型,通過API提供服務(wù)。
- 核心功能:它最亮眼的特點(diǎn)就是支持100多種語言,這讓它在國際化應(yīng)用中如魚得水。作為托管服務(wù),它集成起來非常方便。Cohere還推出了“Rerank 3 Nimble”版本,這個版本在保持高精度的同時,顯著提升了生產(chǎn)環(huán)境下的運(yùn)行速度。
- 性能表現(xiàn):Cohere Rerank在各種嵌入模型初始檢索的場景下,都能提供穩(wěn)定且高精度的表現(xiàn)。Nimble版本能大大縮短響應(yīng)時間。費(fèi)用方面,當(dāng)然是按API調(diào)用量來計算。
- 優(yōu)點(diǎn):通過API集成簡單,性能強(qiáng)大可靠,多語言支持出色,還有速度優(yōu)化的Nimble版本。
- 缺點(diǎn):閉源的商業(yè)服務(wù),按使用量付費(fèi),無法自行修改模型。
- 適用場景:通用RAG應(yīng)用、企業(yè)級搜索平臺、客服聊天機(jī)器人,以及需要廣泛語言支持但又不想管理模型基礎(chǔ)設(shè)施的場景。
示例代碼:
首先安裝Cohere庫:
%pip install --upgrade --quiet cohere然后設(shè)置Cohere和ContextualCompressionRetriever:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA
llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
llm=Cohere(temperature=0), retriever=compression_retriever
)
# 假設(shè) retriever 已經(jīng)定義好
# chain.invoke({'query': 'What did the president say about Ketanji Brown Jackson'})輸出示例:
{'query': 'What did the president say about Ketanji Brown Jackson',
'result': " The president speaks highly of Ketanji Brown Jackson, stating that she
is one of the nation's top legal minds, and will continue the legacy of excellence
of Justice Breyer. The president also mentions that he worked with her family and
that she comes from a family of public school educators and police officers. Since
her nomination, she has received support from various groups, including the
Fraternal Order of Police and judges from both major political parties. \n\nWould
you like me to extract another sentence from the provided text? "}2. bge-reranker (Base/Large)
bge-reranker系列模型來自北京智源人工智能研究院(BAAI),是開源(Apache 2.0許可) 的模型。它們基于Transformer架構(gòu),很可能也是交叉編碼器,專為重排任務(wù)設(shè)計。這個系列提供了不同尺寸的模型,比如Base和Large版本。
- 核心功能:作為開源模型,它給予用戶部署和修改的自由。例如,bge-reranker-v2-m3模型參數(shù)量不到6億,可以在普通硬件(包括消費(fèi)級GPU)上高效運(yùn)行。
- 性能表現(xiàn):這些模型表現(xiàn)非常出色,尤其是大型版本,其結(jié)果通常接近頂級的商業(yè)模型。它們在平均倒數(shù)排名(MRR)分?jǐn)?shù)上表現(xiàn)強(qiáng)勁。主要成本是自托管所需的計算資源。
- 優(yōu)點(diǎn):無需許可費(fèi)(開源),精度高,自托管靈活,在中等硬件上也能表現(xiàn)良好。
- 缺點(diǎn):需要用戶自行管理部署、基礎(chǔ)設(shè)施和更新。性能取決于托管硬件。
- 適用場景:通用RAG任務(wù)、研究項(xiàng)目、偏好開源工具的團(tuán)隊(duì)、預(yù)算敏感型應(yīng)用,以及對自托管技術(shù)棧比較熟悉的開發(fā)者。
示例代碼:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
# 假設(shè) retriever 已經(jīng)定義好
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
# compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
# pretty_print_docs(compressed_docs)輸出示例:
Document 1:
More infrastructure and innovation in America.
More goods moving faster and cheaper in America.
More jobs where you can earn a good living in America.
And instead of relying on foreign supply chains, let’s make it in America.
Economists call it “increasing the productive capacity of our economy.”
I call it building a better America.
My plan to fight inflation will lower your costs and lower the deficit.
----------------------------------------------------------------------------------------------------
Document 2:
Second – cut energy costs for families an average of $500 a year by combatting
climate change.
Let’s provide investments and tax credits to weatherize your homes and businesses to
be energy efficient and you get a tax credit; double America’s clean energy
production in solar, wind, and so much more; lower the price of electric vehicles,
saving you another $80 a month because you’ll never have to pay at the gas pump
again.
----------------------------------------------------------------------------------------------------
Document 3:
Look at cars.
Last year, there weren’t enough semiconductors to make all the cars that people
wanted to buy.
And guess what, prices of automobiles went up.
So—we have a choice.
One way to fight inflation is to drive down wages and make Americans poorer.
I have a better plan to fight inflation.
Lower your costs, not your wages.
Make more cars and semiconductors in America.
More infrastructure and innovation in America.
More goods moving faster and cheaper in America.3. Voyage Rerank
Voyage AI提供的專有神經(jīng)網(wǎng)絡(luò)模型(voyage-rerank-2, voyage-rerank-2-lite)通過API訪問。它們很可能是為最大化關(guān)聯(lián)性評分而精心調(diào)優(yōu)的高級交叉編碼器。
- 核心功能:它們最主要的區(qū)別在于在基準(zhǔn)測試中達(dá)到了頂級的關(guān)聯(lián)度分?jǐn)?shù)。Voyage提供了一個簡單的Python客戶端庫,方便集成。lite版本在性能和速度/成本之間取得了平衡。
- 性能表現(xiàn):voyage-rerank-2在純關(guān)聯(lián)度精度方面通常領(lǐng)先于基準(zhǔn)測試。lite模型與其他的強(qiáng)力競爭者表現(xiàn)相當(dāng)。高精度的rerank-2模型可能會比一些競爭對手有略高的延遲。費(fèi)用與API使用量掛鉤。
- 優(yōu)點(diǎn):狀態(tài)最佳的關(guān)聯(lián)度,可能是目前最準(zhǔn)確的選擇。通過Python客戶端易于使用。
- 缺點(diǎn):專有的API服務(wù),有相關(guān)成本。最高精度的模型可能比其他模型稍慢。
- 適用場景:最適合那些對關(guān)聯(lián)度最大化要求極高的應(yīng)用,比如金融分析、法律文檔審查,或者其他精度比微小速度差異更重要的關(guān)鍵問答場景。
示例代碼:
首先安裝voyage庫:
%pip install --upgrade --quiet voyageai
%pip install --upgrade --quiet langchain-voyageai然后設(shè)置相關(guān)組件:
import os
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.retrievers import ContextualCompressionRetriever
from langchain_openai import OpenAI
from langchain_voyageai import VoyageAIRerank
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_voyageai import VoyageAIEmbeddings
# 假設(shè) State of the Union 文本文件在正確路徑
# documents = TextLoader("../../how_to/state_of_the_union.txt").load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# texts = text_splitter.split_documents(documents)
# retriever = FAISS.from_documents(
# texts, VoyageAIEmbeddings(model="voyage-law-2")
# ).as_retriever(search_kwargs={"k": 20})
# llm = OpenAI(temperature=0)
# compressor = VoyageAIRerank(
# model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3
# )
# compression_retriever = ContextualCompressionRetriever(
# base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke("What did the president say about Ketanji Jackson Brown")
# pretty_print_docs(compressed_docs)輸出示例:
Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 3:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.4. Jina Reranker
Jina提供包括Jina Reranker v2和Jina-ColBERT在內(nèi)的重排解決方案。Jina Reranker v2很可能是一種交叉編碼器模型。Jina-ColBERT則使用Jina的基礎(chǔ)模型實(shí)現(xiàn)了ColBERT架構(gòu)(后面會解釋)。
- 核心功能:Jina提供了性價比高且性能良好的選擇。一個突出特點(diǎn)是Jina-ColBERT能夠處理超長文檔,支持最長8000個Token的上下文長度。這大大減少了對長文本進(jìn)行激進(jìn)分塊的需求。Jina的生態(tài)系統(tǒng)中也包含開源組件。
- 性能表現(xiàn):Jina Reranker v2在速度、成本和關(guān)聯(lián)度之間取得了良好的平衡。Jina-ColBERT在處理長源文檔時表現(xiàn)出色。成本通常具有競爭力。
- 優(yōu)點(diǎn):性能均衡,成本效益高,通過Jina-ColBERT能出色處理長文檔,靈活運(yùn)用現(xiàn)有開源部分。
- 缺點(diǎn):標(biāo)準(zhǔn)的Jina重排器可能無法達(dá)到Voyage等專業(yè)模型的絕對最高精度。
- 適用場景:通用RAG系統(tǒng)、處理長文檔(技術(shù)手冊、研究論文、書籍)的應(yīng)用,以及需要在成本和性能之間取得良好平衡的項(xiàng)目。
示例代碼:
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import JinaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
# 假設(shè) State of the Union 文本文件在正確路徑
# documents = TextLoader(
# "../../how_to/state_of_the_union.txt",
# ).load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# texts = text_splitter.split_documents(documents)
# embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")
# retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})
# query = "What did the president say about Ketanji Brown Jackson"
# docs = retriever.get_relevant_documents(query)
# Doing Reranking with Jina
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.document_compressors import JinaRerank
# compressor = JinaRerank()
# compression_retriever = ContextualCompressionRetriever(
# base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.get_relevant_documents(
# "What did the president say about Ketanji Jackson Brown"
# )
# pretty_print_docs(compressed_docs)輸出示例:
Document 1:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 2:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.5. ColBERT
ColBERT (Contextualized Late Interaction over BERT) 是一種多向量模型。它不像傳統(tǒng)的模型那樣用一個向量表示整個文檔,而是為文檔中的每個token(或短語)創(chuàng)建多個上下文相關(guān)的向量。它采用一種“晚期交互”(late interaction)機(jī)制,即在編碼后,查詢向量才與多個文檔向量進(jìn)行比較。這使得文檔向量可以預(yù)先計算和索引。
- 核心功能:其架構(gòu)允許在文檔被索引后,從大型集合中進(jìn)行非常高效的檢索。多向量方法能夠?qū)崿F(xiàn)查詢詞和文檔內(nèi)容之間細(xì)粒度的比較。這是一種開源的方法。
- 性能表現(xiàn):ColBERT在檢索效率和有效性之間取得了強(qiáng)大的平衡,尤其是在大規(guī)模應(yīng)用中。在初始索引步驟完成后,檢索延遲很低。主要成本是索引和自托管所需的計算資源。
- 優(yōu)點(diǎn):對大型文檔集高效,可擴(kuò)展檢索,開源靈活。
- 缺點(diǎn):初始索引過程可能計算密集且需要大量存儲。
- 適用場景:大規(guī)模RAG應(yīng)用,需要從數(shù)百萬或數(shù)十億文檔中快速檢索的系統(tǒng),以及可以接受預(yù)計算時間的場景。
示例代碼:
安裝Ragtouille庫以使用ColBERT重排器:
pip install -U ragatouille現(xiàn)在設(shè)置ColBERT重排器:
from ragatouille import RAGPretrainedModel
from langchain.retrievers import ContextualCompressionRetriever
# 假設(shè) retriever 已經(jīng)定義好
# RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
# compression_retriever = ContextualCompressionRetriever(
# base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke(
# "What animation studio did Miyazaki found"
# )
# print(compressed_docs[0])輸出示例:
Document(page_cnotallow='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded
the animation production company Studio Ghibli, with funding from Tokuma Shoten.
Studio Ghibli\'s first film, Laputa: Castle in the Sky (1986), employed the same
production crew of Nausica?. Miyazaki\'s designs for the film\'s setting were
inspired by Greek architecture and "European urbanistic templates". Some of the
architecture in the film was also inspired by a Welsh mining town; Miyazaki
witnessed the mining strike upon his first', metadata={'relevance_score':
26.5194149017334})6. FlashRank
FlashRank被設(shè)計為一個非常輕量級且快速的重排庫,通常利用更小、更優(yōu)化的Transformer模型(通常是大型模型的精簡或裁剪版本)。它旨在以最小的計算開銷,實(shí)現(xiàn)比簡單相似度搜索顯著的關(guān)聯(lián)度提升。它像一個交叉編碼器,但使用了加速處理的技術(shù)。它通常作為開源Python庫提供。
- 核心功能:其主要特點(diǎn)是速度和效率。它被設(shè)計為易于集成且資源消耗低(CPU或中等GPU使用)。通常只需少量代碼即可實(shí)現(xiàn)。
- 性能表現(xiàn):雖然不能達(dá)到Cohere或Voyage等最大型交叉編碼器的最高精度,但FlashRank旨在實(shí)現(xiàn)比無重排或基本雙編碼器重排更顯著的提升。其速度使其適用于實(shí)時或高吞吐量場景。成本極低(自托管所需的計算)。
- 優(yōu)點(diǎn):推理速度極快,計算要求低,易于集成,開源。
- 缺點(diǎn):精度可能低于更大、更復(fù)雜的重排模型。模型選擇可能比更廣泛的框架更受限。
- 適用場景:需要在資源受限硬件(如CPU或邊緣設(shè)備)上進(jìn)行快速重排的應(yīng)用,延遲關(guān)鍵的高容量搜索系統(tǒng),以及尋求簡單“聊勝于無”的重排步驟且復(fù)雜度最小的項(xiàng)目。
示例代碼:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain_openai import ChatOpenAI
# 假設(shè) retriever 已經(jīng)定義好
# llm = ChatOpenAI(temperature=0)
# compressor = FlashrankRerank()
# compression_retriever = ContextualCompressionRetriever(
# base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke(
# "What did the president say about Ketanji Jackson Brown"
# )
# print([doc.metadata["id"] for doc in compressed_docs])
# pretty_print_docs(compressed_docs)這個代碼片段利用ContextualCompressionRetriever中的FlashrankRerank來提高檢索文檔的關(guān)聯(lián)度。它專門根據(jù)文檔與查詢“What did the president say about Ketanji Jackson Brown”的相關(guān)性,重新排序由基礎(chǔ)檢索器(由retriever表示)獲得的文檔。最后,它打印文檔ID和經(jīng)過壓縮、重排的文檔。
輸出示例:
[0, 5, 3]
Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage,
their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to
retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy said in his speech to the European
Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United
States is here tonight.
----------------------------------------------------------------------------------------------------
Document 3:
And tonight, I’m announcing that the Justice Department will name a chief prosecutor
for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was
before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a
single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism
It’s exploitation—and it drives up prices.輸出顯示它根據(jù)關(guān)聯(lián)性重新排列了檢索到的塊。
7. MixedBread
Mixedbread AI提供,這個家族包括mxbai-rerank-base-v2(0.5B參數(shù))和mxbai-rerank-large-v2(1.5B參數(shù))。它們是開源(Apache 2.0許可)的交叉編碼器,基于Qwen-2.5架構(gòu)。一個關(guān)鍵的區(qū)別在于它們的訓(xùn)練過程,在初始訓(xùn)練的基礎(chǔ)上,融入了三階段強(qiáng)化學(xué)習(xí)(RL)方法(GRPO、對比學(xué)習(xí)、偏好學(xué)習(xí))。
- 核心功能:聲稱在基準(zhǔn)測試(如BEIR)中實(shí)現(xiàn)最先進(jìn)的性能。支持100多種語言。處理長上下文高達(dá)8k個Token(并兼容32k)。設(shè)計用于與各種數(shù)據(jù)類型良好配合,包括文本、代碼、JSON,以及用于LLM工具選擇。可通過Hugging Face和Python庫獲取。
- 性能表現(xiàn):Mixedbread發(fā)布的基準(zhǔn)測試顯示,這些模型在BEIR上超越了其他頂級的開源和閉源競爭對手(如Cohere和Voyage),大型模型達(dá)到57.49,基礎(chǔ)模型達(dá)到55.57。它們還在延遲測試中顯示出顯著的速度優(yōu)勢,1.5B參數(shù)的模型比其他大型開源重排器快得多。成本是自托管所需的計算資源。
- 優(yōu)點(diǎn):高基準(zhǔn)性能(宣稱SOTA),開源許可,相對于精度而言推理速度快,廣泛的語言支持,超長上下文窗口,適用于多種數(shù)據(jù)類型(代碼、JSON)。
- 缺點(diǎn):需要自托管和基礎(chǔ)設(shè)施管理。由于是相對較新的模型,長期性能和社區(qū)驗(yàn)證仍在進(jìn)行中。
- 適用場景:需要頂級性能的通用RAG,多語言應(yīng)用,處理代碼、JSON或長文檔的系統(tǒng),LLM工具/函數(shù)調(diào)用選擇,以及偏好高性能開源模型的團(tuán)隊(duì)。
示例代碼:
!pip install mxbai_rerank
from mxbai_rerank import MxbaiRerankV2
# Load the model, here we use our base sized model
model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")
# Example query and documents
query = "Who wrote To Kill a Mockingbird?"
documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.","The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.","Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.","Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.","The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# Calculate the scores
results = model.rank(query, documents)
print(results)輸出示例:
[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a
novel by Harper Lee published in 1960. It was immediately successful, winning the
Pulitzer Prize, and has become a classic of modern American literature.'),
RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American
novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in
Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),
RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English
novelist known primarily for her six major novels, which interpret, critique and
comment upon the British landed gentry at the end of the 18th century.'),
RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,
which consists of seven fantasy novels written by British author J.K. Rowling, is
among the most popular and critically acclaimed books of the modern era.'),
RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was
written by Herman Melville and first published in 1851. It is considered a
masterpiece of American literature and deals with complex themes of obsession,
revenge, and the conflict between good and evil.'),
RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel
written by American author F. Scott Fitzgerald, was published in 1925. The story is
set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit
of Daisy Buchanan.')]如何判斷你的重排器是否“給力”?
用了重排器,總得知道它到底有沒有效果吧?評估重排器的工作效果非常重要,通常我們會用到這些“硬核”指標(biāo):
- Accuracy@k:在前個結(jié)果中,相關(guān)文檔出現(xiàn)的頻率。
- Precision@k:前個結(jié)果中,相關(guān)文檔的比例。
- Recall@k:在前個結(jié)果中,找到所有相關(guān)文檔的比例。
- Normalized Discounted Cumulative Gain (NDCG):衡量排序質(zhì)量,它不僅考慮了相關(guān)性,還考慮了文檔的位置。排名越靠前的相關(guān)文檔,得分貢獻(xiàn)越大。它被標(biāo)準(zhǔn)化(0到1之間),便于比較。
- Mean Reciprocal Rank (MRR):關(guān)注找到第一個相關(guān)文檔的排名。它是多個查詢中的平均值。當(dāng)你需要快速找到一個好的結(jié)果時,這個指標(biāo)非常有用。
- F1-score:精確率和召回率的調(diào)和平均值,提供了一個平衡的視角。
如何為你的RAG選擇合適的重排器?
選擇最適合你的重排器,就像選對象一樣,得綜合考慮好幾個方面:
- 關(guān)聯(lián)度需求:你的應(yīng)用對結(jié)果的準(zhǔn)確性要求有多高?是“差不多就行”,還是“毫厘不差”?
- 延遲:重排器需要多快返回結(jié)果?是實(shí)時性要求極高(比如在線客服),還是可以接受一些延遲(比如離線數(shù)據(jù)分析)?
- 成本:是選擇免費(fèi)開源的,還是愿意為商業(yè)API付費(fèi)?
- 部署復(fù)雜度:你是否有能力自行部署和維護(hù)模型,還是希望有托管服務(wù)一步到位?
- 數(shù)據(jù)類型和長度:你處理的文檔是普通文本,還是代碼、JSON等復(fù)雜數(shù)據(jù)?文檔的平均長度如何?
總之,重排器在RAG系統(tǒng)中扮演著越來越重要的角色。它就像是你的AI助手的“秘密武器”,能讓你的大模型回答更精準(zhǔn)、更可靠、更貼心。還在被初次檢索的“噪音”困擾嗎?趕緊把重排器安排上吧!
本文轉(zhuǎn)載自???Halo咯咯??? 作者:基咯咯

















