精品欧美一区二区三区在线观看 _久久久久国色av免费观看性色_国产精品久久在线观看_亚洲第一综合网站_91精品又粗又猛又爽_小泽玛利亚一区二区免费_91亚洲精品国偷拍自产在线观看 _久久精品视频在线播放_美女精品久久久_欧美日韩国产成人在线

構建一套可自我改進的 Agentic RAG 系統 精華

發布于 2025-11-24 00:11
瀏覽
0收藏

Agentic RAG 系統可以被視為一個“高維向量空間”,其中每個維度都對應一次設計決策,例如 prompt engineering、agent 協同、retrieval 策略等。手動調優這些維度以找到“正確組合”非常困難,而且上線后的未見數據往往會打破測試時有效的配置。

一個更好的方法是讓系統學會“自我優化”。一條典型的、能“自我進化”的 Agentic RAG 流水線,遵循如下思考過程:

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Self Improving Agentic RAG System (Created by Fareed Khan)

  • 一個由“專家型代理(specialist agents)”組成的協作團隊執行任務。它基于一個高層概念,按照當前 SOP(標準作業程序)生成一份完整的、多來源文檔。
  • 一個“多維評價系統(multi-dimensional evaluation system)”對團隊輸出進行評分,度量準確性、可行性、合規性等多個目標,得到一個性能向量。
  • 一個“性能診斷代理(diagnostician agent)”分析該向量,像咨詢顧問一樣識別流程中的主要薄弱環節,并追溯根因。
  • 一個“SOP 架構代理(SOP architect agent)”基于診斷洞見更新流程,提出專門用于修復薄弱點的新變體。
  • 每個“SOP 新版本”都會在團隊重復執行任務時進行測試,每次輸出再被評估,以生成對應的性能向量。
  • 系統識別“Pareto front(帕累托前沿)”,即所有已測 SOP 的最優權衡組合,并將這些優化策略呈現給“人類決策者”,從而閉合進化回路。

在這篇博客中,我們將瞄準“醫療健康(healthcare)”領域。該領域的挑戰在于:需要針對輸入查詢或知識庫考慮“多種可能性”,同時“最終決策仍由人類掌握?!?/p>

我們將構建一條端到端、可自我改進的 Agentic RAG 流水線,用來生成 RAG 系統的不同設計模式。

完整代碼可在我的 GitHub 倉庫獲取:

GitHub - FareedKhan-dev/autonomous-agentic-rag: Self improving agentic rag pipeline

目錄

  • 醫學 AI 的知識基礎設施°安裝開源技術?!悱h境配置與依賴導入°配置本地大語言模型°準備知識庫
  • 構建內部臨床試驗設計網絡°定義標準操作規程(Guild SOP)°定義專業智能體(Specialist Agents)°使用 LangGraph 編排公會°完整運行工作流圖
  • 多維度評價體系°為每個參數構建自定義評估器°創建聚合型 LangSmith 評估器
  • 進化引擎的外層循環°管理配置°構建主任級智能體(Director-Level Agents)°運行完整的進化循環
  • 基于五維的帕累托分析°識別帕累托前沿°可視化前沿并做出決策
  • 理解認知工作流°可視化智能體工作流時間線°使用雷達圖剖析輸出結果
  • 將其轉變為自主策略

醫學 AI 的知識基礎設施

在編寫可自進化的 agentic RAG 系統之前,我們需要先建立合適的知識數據庫,以及搭建用于構建架構的必要工具。

一套生產級 RAG 系統通常包含多樣化的數據庫,既包括敏感的組織內部數據,也包含開源數據,用來提升檢索質量,并彌補信息過時或不完整的問題。這個基礎步驟可以說是最關鍵的……

因為數據源的質量將直接決定最終輸出的質量。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Sourcing the knowledge base (Created by Fareed Khan)

本節我們將組裝整套架構的各個組件。計劃如下:

  • 安裝開源技術棧(Open-Source Stack):搭建環境并安裝必要庫,堅持本地、開源優先(open-source-first)。
  • 配置安全可觀測性(Secure Observability):安全加載 API Key,并配置 LangSmith,從一開始就追蹤和調試復雜的代理交互。
  • 搭建本地 LLM 工坊(LLM Foundry):通過 Ollama 構建不同的開源模型組合,為不同任務分配不同模型,以優化表現與成本。
  • 獲取并處理多模態數據:下載并準備 4 類真實數據源:PubMed 科學文獻、FDA 監管指南、倫理原則,以及一個大型結構化臨床數據集(MIMIC-III)。
  • 索引知識庫(Index the Knowledge Stores):最終,將原始數據處理為高效可檢索的數據庫:對非結構化文本使用 FAISS 向量庫,對結構化臨床數據使用 DuckDB。

安裝開源技術棧

第一步是安裝所需的 Python 庫。可復現的環境是一切嚴肅項目的基石。我們選擇業界標準的開源棧,以便對系統進行完全掌控。包括用于核心 agentic 框架的 langchain 和 langgraph、與本地 LLM 交互的 ollama,以及訪問 PubMed 的 biopython、進行高性能臨床數據分析的 duckdb 等專業庫。

讓我們安裝需要的模塊……

# We uses pip "quiet" (-q) and "upgrade" (-U) flags to install all the required packages.
# - langchain, langgraph, etc.: These form the core of our agentic framework for building and orchestrating agents.
# - ollama: This is the client library that allows our Python code to communicate with a locally running Ollama server.
# - duckdb: An incredibly fast, in-process analytical database perfect for handling our structured MIMIC data without a heavy server setup.
# - faiss-cpu: Facebook AI's library for efficient similarity search, which will power the vector stores for our RAG agents.
# - sentence-transformers: A library for easy access to state-of-the-art models for creating text embeddings.
# - biopython, pypdf, beautifulsoup4: A suite of powerful utilities for downloading and parsing our diverse, real-world data sources.
%pip install -U langchain langgraph langchain_community langchain_openai langchain_core ollama pandas duckdb faiss-cpu sentence-transformers biopython pypdf pydantic lxml html2text beautifulsoup4 matplotlib -qqq

我們一次性準備好所有工具和“建筑材料”。各庫各司其職:從用 langgraph 編排 agent 工作流,到用 duckdb 做數據分析。

模塊安裝完成后,讓我們逐一初始化它們。

環境配置與依賴導入

我們需要安全地配置環境。把 API Key 硬編碼在筆記本里既有安全風險,也不利于共享代碼。

我們使用 ??.env?? 文件管理敏感信息,主要是 LangSmith 的 API Key。從一開始就配置 LangSmith 是不可妥協的要求,這將為我們提供深度可觀測性,以跟蹤、調試并理解 agents 之間的交互。上代碼:

import os
import getpass
from dotenv import load_dotenv

# This function from the python-dotenv library searches for a .env file and loads its key-value pairs
# into the operating system's environment variables, making them accessible to our script.
load_dotenv()

# This is a critical check. We verify that our script can access the necessary API keys from the environment.
if"LANGCHAIN_API_KEY"notin os.environ or"ENTREZ_EMAIL"notin os.environ:
    # If the keys are missing, we print an error and halt, as the application cannot proceed.
    print("Required environment variables not set. Please set them in your .env file or environment.")
else:
    # This confirmation tells us our secrets have been loaded securely and are ready for use.
    print("Environment variables loaded successfully.")

# We explicitly set the LangSmith project name. This is a best practice that ensures all traces

# generated by this project are automatically grouped together in the LangSmith user interface for easy analysis.
os.environ["LANGCHAIN_PROJECT"] = "AI_Clinical_Trials_Architect"

??load_dotenv()??? 是敏感憑據與代碼之間的一座“安全橋梁”。它讀取 ??.env??(絕不要提交到版本庫),并將密鑰注入環境。

從現在起,我們使用 LangChain 或 LangGraph 的所有操作都會自動被采集,并發送到 LangSmith 的項目中。

配置本地大語言模型

在生產級 agentic 系統中,“一刀切”的模型策略往往不是最佳。大型 SOTA 模型計算開銷大且慢,把它用于簡單任務會浪費資源(尤其自托管在 GPU 時)。但小模型雖然快速,卻可能缺乏做關鍵決策所需的深度推理能力。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Configuring Local LLMs (Created by Fareed Khan)

關鍵在于將“合適的模型放在系統的合適位置”。我們將構建一個多模型組合(均由 Ollama 本地服務以保障隱私、可控與成本效益),每個模型在特定角色上發揮所長。

先定義一個配置字典,集中管理每個選定模型的客戶端,便于替換與統一管理。

from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings

# This dictionary will act as our central registry, or "foundry," for all LLM and embedding model clients.
llm_config = {
    # For the 'planner', we use Llama 3.1 8B. It's a modern, highly capable model that excels at instruction-following.
    # We set `format='json'` to leverage Ollama's built-in JSON mode, ensuring reliable structured output for this critical task.
    "planner": ChatOllama(model="llama3.1:8b-instruct", temperature=0.0, format='json'),
    
    # For the 'drafter' and 'sql_coder', we use Qwen2 7B. It's a nimble and fast model, perfect for
    # tasks like text generation and code completion where speed is valuable.
    "drafter": ChatOllama(model="qwen2:7b", temperature=0.2),
    "sql_coder": ChatOllama(model="qwen2:7b", temperature=0.0),
    
    # For the 'director', the highest-level strategic agent, we use the powerful Llama 3 70B model.
    # This high-stakes task of diagnosing performance and evolving the system's own procedures
    # justifies the use of a larger, more powerful model.
    "director": ChatOllama(model="llama3:70b", temperature=0.0, format='json'),
    # For embeddings, we use 'nomic-embed-text', a top-tier, efficient open-source model.
    "embedding_model": OllamaEmbeddings(model="nomic-embed-text")
}

我們剛剛創建了 ??llm_config?? 字典,作為所有模型初始化的“中央樞紐”。通過為不同角色分配不同模型,構建一套按成本-性能權衡優化的層次結構。

  • 快速靈巧(7B–8B):??planner???、??drafter???、??sql_coder?? 處理頻繁、定義清晰的任務。使用 Qwen2 7B、Llama 3.1 8B 能保證低延遲與高性價比,同時具備足夠的指令跟隨能力生成計劃、撰寫文本或編寫 SQL。
  • 深度策略(70B):??director?? 需要分析多維性能數據并改寫整個 SOP,要求較強的因果推理與全局理解。為這種“低頻高風險”任務分配 Llama 3 70B 是合理的。

打印配置以確認:

# Print the configuration to confirm the clients are initialized and their parameters are set correctly.
print("LLM clients configured:")
print(f"Planner ({llm_config['planner'].model}): {llm_config['planner']}")
print(f"Drafter ({llm_config['drafter'].model}): {llm_config['drafter']}")
print(f"SQL Coder ({llm_config['sql_coder'].model}): {llm_config['sql_coder']}")
print(f"Director ({llm_config['director'].model}): {llm_config['director']}")
print(f"Embedding Model ({llm_config['embedding_model'].model}): {llm_config['embedding_model']}")

輸出示例:

#### OUTPUT ####
LLM clients configured:
Planner (llama3.1:8b-instruct): ChatOllama(model='llama3.1:8b-instruct', temperature=0.0, format='json')
Drafter (qwen2:7b): ChatOllama(model='qwen2:7b', temperature=0.2)
SQL Coder (qwen2:7b): ChatOllama(model='qwen2:7b', temperature=0.0)
Director (llama3:70b): ChatOllama(model='llama3:70b', temperature=0.0, format='json')
Embedding Model (nomic-embed-text): OllamaEmbeddings(model='nomic-embed-text')

這表明 ??ChatOllama??? 和 ??OllamaEmbeddings?? 客戶端已按指定模型與參數成功初始化。接下來連接知識庫。

準備知識庫

RAG 的“靈魂”在于一套豐富的多模態知識基座。面對臨床試驗設計這樣的專業任務,通用的網頁搜索遠遠不夠。我們需要以權威、領域特定的信息作為根基。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Knowledge store creation (Created by Fareed Khan)

為此,我們將構建一個全面的“知識庫”,從四類真實世界數據中采集、下載并處理內容。多源融合對幫助 agents 進行信息綜合至關重要,最終輸出也會更全面更可靠。

先創建數據目錄:

import os

# A dictionary to hold the paths for our different data types. This keeps our file management clean and centralized.
data_paths = {
    "base": "./data",
    "pubmed": "./data/pubmed_articles",
    "fda": "./data/fda_guidelines",
    "ethics": "./data/ethical_guidelines",
    "mimic": "./data/mimic_db"
}
# This loop iterates through our defined paths and uses os.makedirs() to create any directories that don't already exist.
# This prevents errors in later steps when we try to save files to these locations.
for path in data_paths.values():
    ifnot os.path.exists(path):
        os.makedirs(path)
        print(f"Created directory: {path}")

這確保項目從一開始就擁有干凈、組織良好的文件結構。

接著從 PubMed 獲取真實文獻,為 ??Medical Researcher?? 提供核心知識:

from Bio import Entrez
from Bio import Medline

defdownload_pubmed_articles(query, max_articles=20):
    """Fetches abstracts from PubMed for a given query and saves them as text files."""
    # The NCBI API requires an email address for identification. We fetch it from our environment variables.
    Entrez.email = os.environ.get("ENTREZ_EMAIL")
    print(f"Fetching PubMed articles for query: {query}")
    
    # Step 1: Use Entrez.esearch to find the PubMed IDs (PMIDs) for articles matching our query.
    handle = Entrez.esearch(db="pubmed", term=query, retmax=max_articles, sort="relevance")
    record = Entrez.read(handle)
    id_list = record["IdList"]
    print(f"Found {len(id_list)} article IDs.")
    
    print("Downloading articles...")
    # Step 2: Use Entrez.efetch to retrieve the full records (in MEDLINE format) for the list of PMIDs.
    handle = Entrez.efetch(db="pubmed", id=id_list, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    
    count = 0
    # Step 3: Iterate through the retrieved records, parse them, and save each abstract to a file.
    for i, record inenumerate(records):
        pmid = record.get("PMID", "")
        title = record.get("TI", "No Title")
        abstract = record.get("AB", "No Abstract")
        if pmid:
            # We name the file after the PMID for easy reference and to avoid duplicates.
            filepath = os.path.join(data_paths["pubmed"], f"{pmid}.txt")
            withopen(filepath, "w") as f:
                f.write(f"Title: {title}\n\nAbstract: {abstract}")
            print(f"[{i+1}/{len(id_list)}] Fetching PMID: {pmid}... Saved to {filepath}")
            count += 1
    return count

該函數按 3 步連接 NCBI,檢索符合布爾查詢的 PMID、拉取 MEDLINE 記錄并保存標題與摘要到本地文本文件。

執行:

# We define a specific, boolean query to find articles highly relevant to our trial concept.
pubmed_query = "(SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)"
num_downloaded = download_pubmed_articles(pubmed_query)
print(f"PubMed download complete. {num_downloaded} articles saved.")

示例輸出:

#### OUTPUT ####
Fetching PubMed articles for query: (SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)
Found 20 article IDs.
Downloading articles...
[1/20] Fetching PMID: 38810260... Saved to ./data/pubmed_articles/38810260.txt
[2/20] Fetching PMID: 38788484... Saved to ./data/pubmed_articles/38788484.txt
...
PubMed download complete. 20 articles saved.

現在 ??Medical Researcher?? 具備扎實、最新、領域特定的科學依據。

接下來獲取監管文件,供 ??Regulatory Specialist?? 使用:

import requests
from pypdf import PdfReader
import io

defdownload_and_extract_text_from_pdf(url, output_path):
    """Downloads a PDF from a URL, saves it, and also extracts its text content to a separate .txt file."""
    print(f"Downloading FDA Guideline: {url}")
    try:
        # We use the 'requests' library to perform the HTTP GET request to download the file.
        response = requests.get(url)
        response.raise_for_status() # This is a good practice that will raise an error if the download fails (e.g., a 404 error).
        
        # We save the raw PDF file, which is useful for archival purposes.
        withopen(output_path, 'wb') as f:
            f.write(response.content)
        print(f"Successfully downloaded and saved to {output_path}")
        
        # We then use pypdf to read the PDF content directly from the in-memory response.
        reader = PdfReader(io.BytesIO(response.content))
        text = ""
        # We loop through each page of the PDF and append its extracted text.
        for page in reader.pages:
            text += page.extract_text() + "\n\n"
        
        # Finally, we save the clean, extracted text to a .txt file. This is the file our RAG system will actually use.
        txt_output_path = os.path.splitext(output_path)[0] + '.txt'
        withopen(txt_output_path, 'w') as f:
            f.write(text)
        returnTrue
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")
        returnFalse

運行下載 FDA 指南并抽取文本:

# This URL points to a real FDA guidance document for developing drugs for diabetes.
fda_url = "https://www.fda.gov/media/71185/download"
fda_pdf_path = os.path.join(data_paths["fda"], "fda_diabetes_guidance.pdf")
download_and_extract_text_from_pdf(fda_url, fda_pdf_path)

#### OUTPUT ####
Downloading FDA Guideline: https://www.fda.gov/media/71185/download
Successfully downloaded and saved to ./data/fda_guidelines/fda_diabetes_guidance.pdf

現在 ??Regulatory Specialist?? 擁有法律與監管文本的基礎語料。

接著為 ??Ethics Specialist?? 準備一份精要文檔(相當于 Belmont Report 的核心原則摘要),以確保其推理建立在最重要概念之上:

# This multi-line string contains a curated summary of the three core principles of the Belmont Report,
# which is the foundational document for ethics in human subject research in the United States.
ethics_content = """
Title: Summary of the Belmont Report Principles for Clinical Research
1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairments, unless the research is directly intended to benefit that population.
2. Beneficence: This principle involves two complementary rules: (1) do not harm and (2) maximize possible benefits and minimize possible harms. The criteria must be designed to select a population that is most likely to benefit and least likely to be harmed by the intervention. The risks to subjects must be reasonable in relation to anticipated benefits.
3. Justice: This principle concerns the fairness of distribution of the burdens and benefits of research. The selection of research subjects must be equitable. Criteria should not be designed to exclude certain groups without a sound scientific or safety-related justification. For example, excluding participants based on race, gender, or socioeconomic status is unjust unless there is a clear rationale related to the drug's mechanism or risk profile.
"""

# We define the path where our ethics document will be saved.
ethics_path = os.path.join(data_paths["ethics"], "belmont_summary.txt")

# We open the file in write mode and save the content.
with open(ethics_path, "w") as f:
    f.write(ethics_content)
print(f"Created ethics guideline file: {ethics_path}")

最后是最復雜的數據源:來自 MIMIC-III 的結構化臨床數據,為 ??Patient Cohort Analyst?? 提供真實世界人群數據,用以評估招募可行性。

import duckdb
import pandas as pd
import os


defload_real_mimic_data():
    """Loads real MIMIC-III CSVs into a persistent DuckDB database file, processing the massive LABEVENTS table efficiently."""
    print("Attempting to load real MIMIC-III data from local CSVs...")
    db_path = os.path.join(data_paths["mimic"], "mimic3_real.db")
    csv_dir = os.path.join(data_paths["mimic"], "mimiciii_csvs")
    
    # Define the paths to the required compressed CSV files.
    required_files = {
        "patients": os.path.join(csv_dir, "PATIENTS.csv.gz"),
        "diagnoses": os.path.join(csv_dir, "DIAGNOSES_ICD.csv.gz"),
        "labevents": os.path.join(csv_dir, "LABEVENTS.csv.gz"),
    }
    
    # Before starting, we check if all the necessary source files are present.
    missing_files = [path for path in required_files.values() ifnot os.path.exists(path)]
    if missing_files:
        print("ERROR: The following MIMIC-III files were not found:")
        for f in missing_files: print(f"- {f}")
        print("\nPlease download them as instructed and place them in the correct directory.")
        returnNone
    
    print("Required files found. Proceeding with database creation.")
    # Remove any old database file to ensure we are building from scratch.
    if os.path.exists(db_path):
        os.remove(db_path)
    # Connect to DuckDB. If the database file doesn't exist, it will be created.
    con = duckdb.connect(db_path)
    
    # Use DuckDB's powerful `read_csv_auto` to directly load data from the gzipped CSVs into SQL tables.
    print(f"Loading {required_files['patients']} into DuckDB...")
    con.execute(f"CREATE TABLE patients AS SELECT SUBJECT_ID, GENDER, DOB, DOD FROM read_csv_auto('{required_files['patients']}')")
    
    print(f"Loading {required_files['diagnoses']} into DuckDB...")
    con.execute(f"CREATE TABLE diagnoses_icd AS SELECT SUBJECT_ID, ICD9_CODE FROM read_csv_auto('{required_files['diagnoses']}')")
    
    # The LABEVENTS table is enormous. To handle it robustly, we use a two-stage process.
    print(f"Loading and processing {required_files['labevents']} (this may take several minutes)...")
    # 1. Load the data into a temporary 'staging' table, treating all columns as text (`all_varchar=True`).
    #    This prevents parsing errors with mixed data types. We also filter for only the lab item IDs we
    #    care about (50912 for Creatinine, 50852 for HbA1c) and use a regex to ensure VALUENUM is numeric.
    con.execute(f"""CREATE TABLE labevents_staging AS 
                   SELECT SUBJECT_ID, ITEMID, VALUENUM 
                   FROM read_csv_auto('{required_files['labevents']}', all_varchar=True) 
                   WHERE ITEMID IN ('50912', '50852') AND VALUENUM IS NOT NULL AND VALUENUM ~ '^[0-9]+(\\.[0-9]+)?$'
                """)
    # 2. Create the final, clean table by selecting from the staging table and casting the columns to their correct numeric types.
    con.execute("CREATE TABLE labevents AS SELECT SUBJECT_ID, CAST(ITEMID AS INTEGER) AS ITEMID, CAST(VALUENUM AS DOUBLE) AS VALUENUM FROM labevents_staging")
    # 3. Drop the temporary staging table to save space.
    con.execute("DROP TABLE labevents_staging")
    con.close()
    return db_path

這里利用 DuckDB 直接從磁盤處理大型 CSV,而不是用 pandas 全量讀入內存;對 LABEVENTS 采用兩階段清洗(先 all_varchar 過濾,再強制轉換類型),以穩健應對數據質量問題并得到清潔高效的查詢表。

執行并檢查:

# Execute the function to build the database.
db_path = load_real_mimic_data()

# If the database was created successfully, connect to it and inspect the schema and some sample data.
if db_path:
    print(f"\nReal MIMIC-III database created at: {db_path}")
    print("\nTesting database connection and schema...")
    con = duckdb.connect(db_path)
    print(f"Tables in DB: {con.execute('SHOW TABLES').df()['name'].tolist()}")
    print("\nSample of 'patients' table:")
    print(con.execute("SELECT * FROM patients LIMIT 5").df())
    print("\nSample of 'diagnoses_icd' table:")
    print(con.execute("SELECT * FROM diagnoses_icd LIMIT 5").df())
    con.close()

示例輸出略,顯示三張表均已創建成功,可查詢。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Pre-processing Step (Created by Fareed Khan)

最后,將所有非結構化文本數據索引為可檢索的向量庫,以便 RAG 使用:

from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

defcreate_vector_store(folder_path: str, embedding_model, store_name: str):
    """Loads all .txt files from a folder, splits them into chunks, and creates an in-memory FAISS vector store."""
    print(f"--- Creating {store_name} Vector Store ---")
    # Use DirectoryLoader to efficiently load all .txt files from the specified folder.
    loader = DirectoryLoader(folder_path, glob="**/*.txt", loader_cls=TextLoader, show_progress=True)
    documents = loader.load()
    
    ifnot documents:
        print(f"No documents found in {folder_path}, skipping vector store creation.")
        returnNone
    
    # Use RecursiveCharacterTextSplitter to break large documents into smaller, 1000-character chunks with a 100-character overlap.
    # The overlap helps maintain context between chunks.
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = text_splitter.split_documents(documents)
    
    print(f"Loaded {len(documents)} documents, split into {len(texts)} chunks.")
    print("Generating embeddings and indexing into FAISS... (This may take a moment)")
    # FAISS.from_documents is a convenient function that handles both embedding the text chunks
    # and building the efficient FAISS index in one step.
    db = FAISS.from_documents(texts, embedding_model)
    print(f"{store_name} Vector Store created successfully.")
    return db

defcreate_retrievers(embedding_model):
    """Creates vector store retrievers for all unstructured data sources and consolidates all knowledge stores."""
    # Create a separate, specialized vector store for each type of document.
    pubmed_db = create_vector_store(data_paths["pubmed"], embedding_model, "PubMed")
    fda_db = create_vector_store(data_paths["fda"], embedding_model, "FDA")
    ethics_db = create_vector_store(data_paths["ethics"], embedding_model, "Ethics")
    
    # Return a single dictionary containing all configured data access tools.
    # The 'as_retriever' method converts the vector store into a standard LangChain Retriever object.
    # The 'k' parameter in 'search_kwargs' controls how many top documents are returned by a search.
    return {
        "pubmed_retriever": pubmed_db.as_retriever(search_kwargs={"k": 3}) if pubmed_db elseNone,
        "fda_retriever": fda_db.as_retriever(search_kwargs={"k": 3}) if fda_db elseNone,
        "ethics_retriever": ethics_db.as_retriever(search_kwargs={"k": 2}) if ethics_db elseNone,
        "mimic_db_path": db_path # We also include the file path to our structured DuckDB database.
    }

??create_vector_store??? 封裝了“load -> split -> embed -> index”的標準 RAG 構建流程;??create_retrievers??? 則為每類語料構建獨立向量庫并返回 retriever 字典。我們采用“分域向量庫”而非“大一統”,以便各代理只檢索各自相關的知識源(例如 ??Regulatory Specialist??? 僅使用 ??fda_retriever??)。

執行創建:

# Execute the function to create all our retrievers.
knowledge_stores = create_retrievers(llm_config["embedding_model"])

print("\nKnowledge stores and retrievers created successfully.")

# Print the final dictionary to confirm all components are present.
for name, store in knowledge_stores.items():
    print(f"{name}: {store}")

輸出顯示各 retriever 創建成功。

至此,數據(下載、處理、索引)與 LLM(配置)均已就緒,可以開始構建系統的第一大組件:Trial Design Guild(試驗設計工會)。

構建內部臨床試驗設計網絡

隨著知識庫就緒,現在構建系統核心。這不是一個簡單線性的 RAG chain,而是一套基于 LangGraph 的協作式多代理工作流:一支 AI 專家團隊,共同將高層試驗概念轉化為一份詳細、數據支撐的標準化標準文檔。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Main Inner Loop RAG (Created by Fareed Khan)

整個架構的行為不是硬編碼的,而由一個動態配置對象治理:標準作業程序(Standard Operating Procedure,??GuildSOP??)。

這個 SOP 是我們 RAG 流水線的“基因組(genome)”,也是外層的“AI Research Director”將要進化與優化的對象。

本節計劃:

  • 定義 RAG 基因組:創建 Pydantic 模型??GuildSOP??,用于驅動整個工作流架構。
  • 設計共享工作臺:定義??GuildState??,作為代理共享計劃與發現的中央空間。
  • 構建專家型代理:將 Planner、Researchers、SQL Analyst、Synthesizer 分別實現為 Python 函數,作為圖中的節點。
  • 編排協作:用 LangGraph 將這些 agent 節點接線成完整端到端工作流。
  • 全量測試:用 baseline SOP 調用完整的 Guild graph,觀察其實際運行并生成首版標準文檔。

定義公會標準操作規程

先定義控制整體流程行為的結構。我們用 Pydantic ??BaseModel??? 創建 ??GuildSOP??。通過強類型、校驗、自文檔化,讓 SOP 穩定可進化。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Guild SOP Design (Created by Fareed Khan)

from pydantic import BaseModel, Field
from typing importLiteral

classGuildSOP(BaseModel):
    """Standard Operating Procedures for the Trial Design Guild. This object acts as the dynamic configuration for the entire RAG workflow."""
    
    # This field holds the system prompt for the Planner Agent, dictating its strategy.
    planner_prompt: str = Field(descriptinotallow="The system prompt for the Planner Agent.")
    
    # This parameter controls how many documents the Medical Researcher retrieves, allowing us to tune the breadth of its search.
    researcher_retriever_k: int = Field(descriptinotallow="Number of documents for the Medical Researcher to retrieve.", default=3)
    
    # This is the system prompt for the final writer, the Synthesizer Agent.
    synthesizer_prompt: str = Field(descriptinotallow="The system prompt for the Criteria Synthesizer Agent.")
    
    # This allows us to dynamically change the model used for the final drafting stage, trading off speed vs. quality.
    synthesizer_model: Literal["qwen2:7b", "llama3.1:8b-instruct"] = Field(descriptinotallow="The LLM to use for the Synthesizer.", default="qwen2:7b")
    
    # These booleans act as "feature flags," allowing the Director to turn entire agent capabilities on or off.
    use_sql_analyst: bool = Field(descriptinotallow="Whether to use the Patient Cohort Analyst agent.", default=True)
    use_ethics_specialist: bool = Field(descriptinotallow="Whether to use the Ethics Specialist agent.", default=True)

??GuildSOP??? 公開了關鍵參數(如 prompts、??researcher_retriever_k???、以及 agent 開關),使外層 AI Director 能夠拉動這些“策略杠桿”,進而調優整體性能。??synthesizer_model??? 使用 ??Literal?? 限定取值集合,保證類型安全。

構建 baseline 版本:

import json

baseline_sop = GuildSOP(
    planner_prompt="""You are a master planner for clinical trial design...""",
    synthesizer_prompt="""You are an expert medical writer...""",
    researcher_retriever_k=3,
    synthesizer_model="qwen2:7b",
    use_sql_analyst=True,
    use_ethics_specialist=True
)

打?。?/p>

print("Baseline GuildSOP (v1.0):")
print(json.dumps(baseline_sop.dict(), indent=4))

輸出顯示 baseline SOP 的全部配置,作為初始“手工工程”的最佳猜測,供 AI Director 后續優化與超越。

定義專業智能體(Specialist Agents)

有了“規則書”(SOP),接下來定義 agents。在 LangGraph 中,agent 是一個節點(Python 函數),輸入為當前圖狀態,輸出為狀態增量。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Specialist Agents (Created by Fareed Khan)

先定義共享狀態 ??GuildState??,充當協作“工作臺”,保存初始請求、planner 生成的計劃、各專家的發現、以及最終輸出。

from typing importList, Dict, Any, Optional
from langchain_core.pydantic_v1 import BaseModel
from typing_extensions import TypedDict

classAgentOutput(BaseModel):
    """A structured output for each agent's findings."""
    agent_name: str
    findings: Any

classGuildState(TypedDict):
    """The state of the Trial Design Guild's workflow, passed between all nodes."""
    initial_request: str
    plan: Optional[Dict[str, Any]]
    agent_outputs: List[AgentOutput]
    final_criteria: Optional[str]
    sop: GuildSOP

接著實現 ??planner_agent???,它讀取 SOP 中的 ??planner_prompt?? 并產出結構化計劃(JSON)指導后續 agents:

def planner_agent(state: GuildState) -> GuildState:
    """Receives the initial request and creates a structured plan for the specialist agents."""
    print("--- EXECUTING PLANNER AGENT ---")

    sop = state['sop']

    planner_llm = ll-config['planner'].with_structured_output(schema={"plan": []})
    
    prompt = f"{sop.planner_prompt}\n\nTrial Concept: '{state['initial_request']}'"
    print(f"Planner Prompt:\n{prompt}")
    
    response = planner_llm.invoke(prompt)
    print(f"Generated Plan:\n{json.dumps(response, indent=2)}")
    
    return {**state, "plan": response}

然后實現通用的“檢索型代理”函數 ??retrieval_agent???,供 ??Medical Researcher???、??Regulatory Specialist???、??Ethics Specialist?? 復用:

def retrieval_agent(task_description: str, state: GuildState, retriever_name: str, agent_name: str) -> AgentOutput:
    """A generic agent function that performs retrieval from a specified vector store based on a task description."""
    print(f"--- EXECUTING {agent_name.upper()} ---")
    print(f"Task: {task_description}")
    
    retriever = knowledge_stores[retriever_name]
    
    if agent_name == "Medical Researcher":
        retriever.search_kwargs['k'] = state['sop'].researcher_retriever_k
        print(f"Using k={state['sop'].researcher_retriever_k} for retrieval.")

    retrieved_docs = retriever.invoke(task_description)
    
    findings = "\n\n---\n\n".join([f"Source: {doc.metadata.get('source', 'N/A')}\n\n{doc.page_content}"for doc in retrieved_docs])
    print(f"Retrieved {len(retrieved_docs)} documents.")
    print(f"Sample Finding:\n{findings[:500]}...")
    
    return AgentOutput(agent_name=agent_name, findings=findings)

??Patient Cohort Analyst?? 是最復雜的代理:Text-to-SQL,將自然語言轉為有效 SQL 并在 DuckDB 上執行,給出可招募人群估算:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

defpatient_cohort_analyst(task_description: str, state: GuildState) -> AgentOutput:
    """Estimates cohort size by generating and then executing a SQL query against the MIMIC database."""
    print("--- EXECUTING PATIENT COHORT ANALYST ---")
    
    ifnot state['sop'].use_sql_analyst:
        print("SQL Analyst skipped as per SOP.")
        return AgentOutput(agent_name="Patient Cohort Analyst", findings="Analysis skipped as per SOP.")
    
    con = duckdb.connect(knowledge_stores['mimic_db_path'])
    schema_query = """
    SELECT table_name, column_name, data_type 
    FROM information_schema.columns 
    WHERE table_schema = 'main' ORDER BY table_name, column_name;
    """
    schema = con.execute(schema_query).df()
    con.close()
    
    sql_generation_prompt = ChatPromptTemplate.from_messages([
        ("system", f"You are an expert SQL writer specializing in DuckDB. ... schema:\n{schema.to_string()}\n\nIMPORTANT: All column names ...\n\nKey Mappings:\n- T2DM ... ICD9_CODE '25000'.\n- Moderate renal impairment ... creatinine ... ITEMID 50912 ... VALUENUM 1.5-3.0.\n- Uncontrolled T2D ... HbA1c ... ITEMID 50852 ... VALUENUM > 8.0."),
        ("human", "Please write a SQL query to count the number of unique patients who meet the following criteria: {task}")
    ])
    
    sql_chain = sql_generation_prompt | llm_config['sql_coder'] | StrOutputParser()
    
    print(f"Generating SQL for task: {task_description}")
    sql_query = sql_chain.invoke({"task": task_description})
    sql_query = sql_query.strip().replace("```sql", "").replace("```", "")
    print(f"Generated SQL Query:\n{sql_query}")
    try:
        con = duckdb.connect(knowledge_stores['mimic_db_path'])
        result = con.execute(sql_query).fetchone()
        patient_count = result[0] if result else0
        con.close()
        
        findings = f"Generated SQL Query:\n{sql_query}\n\nEstimated eligible patient count from the database: {patient_count}."
        print(f"Query executed successfully. Estimated patient count: {patient_count}")
    except Exception as e:
        findings = f"Error executing SQL query: {e}. Defaulting to a count of 0."
        print(f"Error during query execution: {e}")
    return AgentOutput(agent_name="Patient Cohort Analyst", findings=findings)

最后是 ??criteria_synthesizer???,將各專家發現匯織為正式的“入排標準(Inclusion/Exclusion Criteria)”文檔。支持在 SOP 中動態切換 ??synthesizer_model??:

def criteria_synthesizer(state: GuildState) -> GuildState:
    """Synthesizes all the structured findings from the specialist agents into the final criteria document."""
    print("--- EXECUTING CRITERIA SYNTHESIZER ---")
    
    sop = state['sop']
    drafter_llm = ChatOllama(model=sop.synthesizer_model, temperature=0.2)

    context = "\n\n---\n\n".join([f"**{out.agent_name} Findings:**\n{out.findings}"for out in state['agent_outputs']])
    
    prompt = f"{sop.synthesizer_prompt}\n\n**Context from Specialist Teams:**\n{context}"
    print(f"Synthesizer is using model '{sop.synthesizer_model}'.")
    
    response = drafter_llm.invoke(prompt)
    print("Final criteria generated.")
    
    return {**state, "final_criteria": response.content}

使用 LangGraph 編排

將以上 agent 節點用 LangGraph 編排:Planner → 專家并行執行 → Synthesizer。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Guild with langgraph (Created by Fareed Khan)

定義“調度節點”,根據 plan 分派任務:

from langgraph.graph import StateGraph, END

defspecialist_execution_node(state: GuildState) -> GuildState:
    """This node acts as a dispatcher, executing all specialist tasks defined in the plan."""
    plan_tasks = state['plan']['plan']
    outputs = []
    
    for task in plan_tasks:
        agent_name = task['agent']
        task_desc = task['task_description']
        
        if"Regulatory"in agent_name:
            output = retrieval_agent(task_desc, state, "fda_retriever", "Regulatory Specialist")
        elif"Medical"in agent_name:
            output = retrieval_agent(task_desc, state, "pubmed_retriever", "Medical Researcher")
        elif"Ethics"in agent_name and state['sop'].use_ethics_specialist:
            output = retrieval_agent(task_desc, state, "ethics_retriever", "Ethics Specialist")
        elif"Cohort"in agent_name:
            output = patient_cohort_analyst(task_desc, state)
        else:
            continue
        
        outputs.append(output)
    return {**state, "agent_outputs": outputs}

構建與編譯 graph:

workflow = StateGraph(GuildState)

workflow.add_node("planner", planner_agent)
workflow.add_node("execute_specialists", specialist_execution_node)
workflow.add_node("synthesizer", criteria_synthesizer)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "execute_specialists")
workflow.add_edge("execute_specialists", "synthesizer")
workflow.add_edge("synthesizer", END)

guild_graph = workflow.compile()
print("Graph compiled successfully.")

可選圖形化略。至此,“Inner Loop” 多代理 RAG 管線搭建完畢。

完整運行公會工作流圖

用 baseline SOP 和真實試驗概念進行端到端測試,驗證 agents、數據存儲與編排邏輯是否協作正常,并產出我們的首個“baseline”輸出,供后續評估與進化環路使用。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Run Workflow (Created by Fareed Khan)

test_request = "Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3)."

print("Running the full Guild graph with baseline SOP v1.0...")
graph_input = {
    "initial_request": test_request,
    "sop": baseline_sop
}
final_result = guild_graph.invoke(graph_input)
print("\nFinal Guild Output:")
print("---------------------")
print(final_result['final_criteria'])

輸出日志顯示每個 agent 的執行過程,并最終得到結構良好的入排標準文檔。至此,我們已構建并測試了一套基于真實數據源的多代理 RAG 流水線。

多維度評價體系

一個能自我改進的系統,必須能夠衡量自己的表現。我們需要的不只是單一分數(如 accuracy),而是多維度質量評估。我們將構建一個多維評估套件,對 Guild 輸出在我們最初就確定的“五大支柱”上進行評分。這將為“外層進化環路”提供豐富、可操作的反饋信號。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Multi-dimension Eval (Created by Fareed Khan)

本節計劃:

  • LLM-as-a-Judge:用??llama3:70b?? 構建三個“專家評委”,分別評 Scientific Rigor、Regulatory Compliance、Ethical Soundness。
  • 程序化評估:用兩段快速、可靠、客觀的程序化函數,評 Recruitment Feasibility 與 Operational Simplicity。
  • 匯總評估器:將五個單項評估封裝為一個總評函數,接收 Guild 輸出并生成 5D 性能向量,供 AI Director 決策使用。

為每個參數構建自定義評估器

首先定義 LLM 評委的統一輸出結構:

from langchain_core.pydantic_v1 import BaseModel, Field

class GradedScore(BaseModel):
    """A Pydantic model to structure the output of our LLM-as-a-Judge evaluators."""
    score: float = Field(descriptinotallow="A score from 0.0 to 1.0")
    reasoning: str = Field(descriptinotallow="A brief justification for the score.")
  1. Scientific Rigor:

from langchain_core.prompts import ChatPromptTemplate

def scientific_rigor_evaluator(generated_criteria: str, pubmed_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert clinical scientist. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Supporting Scientific Context:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": pubmed_context})
  1. Regulatory Compliance:

def regulatory_compliance_evaluator(generated_criteria: str, fda_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert regulatory affairs specialist. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Applicable FDA Guidelines:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": fda_context})
  1. Ethical Soundness:

def ethical_soundness_evaluator(generated_criteria: str, ethics_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert on clinical trial ethics. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Ethical Principles:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": ethics_context})
  1. Recruitment Feasibility(程序化):

def feasibility_evaluator(cohort_analyst_output: AgentOutput) -> GradedScore:
    findings_text = cohort_analyst_output.findings
    try:
        count_str = findings_text.split("database: ")[1].replace('.', '')
        patient_count = int(count_str)
    except (IndexError, ValueError):
        return GradedScore(score=0.0, reasnotallow="Could not parse patient count from analyst output.")
    
    IDEAL_COUNT = 150.0
    score = min(1.0, patient_count / IDEAL_COUNT)
    reasoning = f"Estimated {patient_count} eligible patients. Score is normalized against an ideal target of {int(IDEAL_COUNT)}."
    return GradedScore(score=score, reasnotallow=reasoning)
  1. Operational Simplicity(程序化):

def simplicity_evaluator(generated_criteria: str) -> GradedScore:
    EXPENSIVE_TESTS = ["mri", "genetic sequencing", "pet scan", "biopsy", "echocardiogram", "endoscopy"]
    test_count = sum(1 for test in EXPENSIVE_TESTS if test in generated_criteria.lower())
    score = max(0.0, 1.0 - (test_count * 0.5))
    reasoning = f"Found {test_count} expensive/complex screening procedures mentioned."
    return GradedScore(score=score, reasnotallow=reasoning)

創建聚合型 LangSmith 評估器

定義總評結果模型與匯總函數:

class EvaluationResult(BaseModel):
    rigor: GradedScore
    compliance: GradedScore
    ethics: GradedScore
    feasibility: GradedScore
    simplicity: GradedScore

def run_full_evaluation(guild_final_state: GuildState) -> EvaluationResult:
    """Orchestrates the entire evaluation process, calling each of the five specialist evaluators."""
    print("--- RUNNING FULL EVALUATION GAUNTLET ---")
    
    final_criteria = guild_final_state['final_criteria']
    agent_outputs = guild_final_state['agent_outputs']
    
    pubmed_context = next((o.findings for o in agent_outputs if o.agent_name == "Medical Researcher"), "")
    fda_context = next((o.findings for o in agent_outputs if o.agent_name == "Regulatory Specialist"), "")
    ethics_context = next((o.findings for o in agent_outputs if o.agent_name == "Ethics Specialist"), "")
    analyst_output = next((o for o in agent_outputs if o.agent_name == "Patient Cohort Analyst"), None)
    
    print("Evaluating: Scientific Rigor...")
    rigor = scientific_rigor_evaluator(final_criteria, pubmed_context)
    print("Evaluating: Regulatory Compliance...")
    compliance = regulatory_compliance_evaluator(final_criteria, fda_context)
    print("Evaluating: Ethical Soundness...")
    ethics = ethical_soundness_evaluator(final_criteria, ethics_context)
    print("Evaluating: Recruitment Feasibility...")
    feasibility = feasibility_evaluator(analyst_output) if analyst_output else GradedScore(score=0, reasnotallow="Analyst did not run.")
    print("Evaluating: Operational Simplicity...")
    simplicity = simplicity_evaluator(final_criteria)
    
    print("--- EVALUATION GAUNTLET COMPLETE ---")
    return EvaluationResult(rigor=rigor, compliance=compliance, ethics=ethics, feasibility=feasibility, simplicity=simplicity)

對 baseline 輸出運行評估,示例結果顯示在“Feasibility”維度明顯偏低(0.39),這為外層 AI Director 指出了明確改進方向。

進化引擎的外層循環

現在構建系統的“大腦”——“AI Research Director”(外層進化回路)。其職責不是設計試驗,而是改進“設計試驗”的過程:分析 5D 評分、診斷根因、智能改寫 GuildSOP。這是系統學習與自適應的核心。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Outer Loop (Created by Fareed Khan)

本節計劃:

  • 創建“基因池(gene pool)”:管理 SOP 演化版本及其評分,形成可追溯的“基因史”。
  • 設計 Director 級別代理:??Performance Diagnostician??? 識別弱點;??SOP Architect?? 提出改良方案。
  • 架構進化循環:定義完整一代的進化過程:Diagnose → Evolve → Evaluate。
  • 運行一次全流程:展示系統如何自主發現“可行性”弱點并產生新的 SOP 變體修復它。

管理配置

定義 ??SOPGenePool??,存儲 SOP、評分與“父版本”信息:

class SOPGenePool:
    def__init__(self):
        self.pool: List[Dict[str, Any]] = []
        self.version_counter = 0

    defadd(self, sop: GuildSOP, eval_result: EvaluationResult, parent_version: Optional[int] = None):
        self.version_counter += 1
        entry = {
            "version": self.version_counter,
            "sop": sop,
            "evaluation": eval_result,
            "parent": parent_version
        }
        self.pool.append(entry)
        print(f"Added SOP v{self.version_counter} to the gene pool.")
        
    defget_latest_entry(self) -> Optional[Dict[str, Any]]:
        returnself.pool[-1] ifself.pool elseNone

構建主任級智能體(Director-Level Agents)

先是 ??Performance Diagnostician??,分析 5D 向量并給出結構化診斷:

class Diagnosis(BaseModel):
    primary_weakness: Literal['rigor', 'compliance', 'ethics', 'feasibility', 'simplicity']
    root_cause_analysis: str = Field(...)
    recommendation: str = Field(...)

defperformance_diagnostician(eval_result: EvaluationResult) -> Diagnosis:
    print("--- EXECUTING PERFORMANCE DIAGNOSTICIAN ---")
    diagnostician_llm = llm_config['director'].with_structured_output(Diagnosis)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a world-class management consultant ..."),
        ("human", "Please analyze the following performance evaluation report:\n\n{report}")
    ])
    chain = prompt | diagnostician_llm
    return chain.invoke({"report": eval_result.json()})

再是 ??SOP Architect??,根據診斷與當前 SOP 生成多個“變體” SOP 作為候選:

class EvolvedSOPs(BaseModel):
    mutations: List[GuildSOP]

def sop_architect(diagnosis: Diagnosis, current_sop: GuildSOP) -> EvolvedSOPs:
    print("--- EXECUTING SOP ARCHITECT ---")
    architect_llm = llm_config['director'].with_structured_output(EvolvedSOPs)
    prompt = ChatPromptTemplate.from_messages([
        ("system", f"You are an AI process architect. ... schema: {GuildSOP.schema_json()} ..."),
        ("human", "Here is the current SOP:\n{current_sop}\n\nHere is the performance diagnosis:\n{diagnosis}\n\nBased on the diagnosis, please generate 2-3 new, improved SOPs.")
    ])
    chain = prompt | architect_llm
    return chain.invoke({"current_sop": current_sop.json(), "diagnosis": diagnosis.json()})

運行完整的進化循環

封裝一次完整的進化循環:

def run_evolution_cycle(gene_pool: SOPGenePool, trial_request: str):
    print("\n" + "="*25 + " STARTING NEW EVOLUTION CYCLE " + "="*25)
    
    current_best_entry = gene_pool.get_latest_entry()
    parent_sop = current_best_entry['sop']
    parent_eval = current_best_entry['evaluation']
    parent_version = current_best_entry['version']
    print(f"Improving upon SOP v{parent_version}...")
    
    diagnosis = performance_diagnostician(parent_eval)
    print(f"Diagnosis complete. Primary Weakness: '{diagnosis.primary_weakness}'. Recommendation: {diagnosis.recommendation}")

    new_sop_candidates = sop_architect(diagnosis, parent_sop)
    print(f"Generated {len(new_sop_candidates.mutations)} new SOP candidates.")
    for i, candidate_sop inenumerate(new_sop_candidates.mutations):
        print(f"\n--- Testing SOP candidate {i+1}/{len(new_sop_candidates.mutations)} ---")
        guild_input = {"initial_request": trial_request, "sop": candidate_sop}
        final_state = guild_graph.invoke(guild_input)
        
        eval_result = run_full_evaluation(final_state)
        gene_pool.add(sop=candidate_sop, eval_result=eval_result, parent_versinotallow=parent_version)
    print("\n" + "="*25 + " EVOLUTION CYCLE COMPLETE " + "="*26)

初始化基因池、加入 baseline、運行一輪進化。示例輸出顯示:診斷識別“Feasibility”為主要弱項;Architect 生成兩個候選 SOP;測試后某個候選(v2)顯著提升 Feasibility(例如 0.81),且僅以輕微 Rigor 代價換取巨大實際可行性收益;另一個候選(v3)則未帶來改進。

基于五維的帕累托分析

進化循環完成一代。現在需要對結果進行多目標優化分析。在多目標問題中往往不存在單一“最好”解,而是存在“帕累托前沿(Pareto Frontier)”。目標是識別這一前沿并呈現給人類決策者。

本節計劃:

  • 分析基因池:打印所有 SOP 及其 5D 評分的摘要,以觀察變體的直接影響。
  • 識別 Pareto Front:編寫函數程序化識別基因池中的非支配解(non-dominated solutions)。
  • 可視化前沿:用并行坐標圖(parallel coordinates plot)展示 5D 維度的權衡,讓 trade-off 一目了然。

打印摘要略。然后識別 Pareto 前沿:

import numpy as np

defidentify_pareto_front(gene_pool: SOPGenePool) -> List[Dict[str, Any]]:
    pareto_front = []
    pool_entries = gene_pool.pool
    
    for i, candidate inenumerate(pool_entries):
        is_dominated = False
        cand_scores = np.array([s['score'] for s in candidate['evaluation'].dict().values()])
        
        for j, other inenumerate(pool_entries):
            if i == j: continue
            other_scores = np.array([s['score'] for s in other['evaluation'].dict().values()])
            if np.all(other_scores >= cand_scores) and np.any(other_scores > cand_scores):
                is_dominated = True
                break
        ifnot is_dominated:
            pareto_front.append(candidate)
    return pareto_front

運行后通常得到 v1 與 v2 組成帕累托前沿:v1 為“最大化 Rigor”的策略;v2 為“高 Feasibility”的策略。在實際決策中,如何取舍取決于業務優先級。

識別帕累托前沿

使用 2D 散點圖(Rigor vs. Feasibility)與 5D 并行坐標圖可視化:

import matplotlib.pyplot as plt
import pandas as pd

defvisualize_frontier(pareto_sops):
    ifnot pareto_sops:
        print("No SOPs on the Pareto front to visualize.")
        return
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
    
    labels = [f"v{s['version']}"for s in pareto_sops]
    rigor_scores = [s['evaluation'].rigor.score for s in pareto_sops]
    feasibility_scores = [s['evaluation'].feasibility.score for s in pareto_sops]
    
    ax1.scatter(rigor_scores, feasibility_scores, s=200, alpha=0.7, c='blue')
    for i, txt inenumerate(labels):
        ax1.annotate(txt, (rigor_scores[i], feasibility_scores[i]), xytext=(10,-10), textcoords='offset points', fnotallow=14)
    ax1.set_title('Pareto Frontier: Rigor vs. Feasibility', fnotallow=16)
    ax1.set_xlabel('Scientific Rigor Score', fnotallow=14)
    ax1.set_ylabel('Recruitment Feasibility Score', fnotallow=14)
    ax1.grid(True, linestyle='--', alpha=0.6)
    ax1.set_xlim(min(rigor_scores)-0.05, max(rigor_scores)+0.05)
    ax1.set_ylim(min(feasibility_scores)-0.1, max(feasibility_scores)+0.1)

    data = []
    for s in pareto_sops:
        eval_dict = s['evaluation'].dict()
        scores = {k.capitalize(): v['score'] for k, v in eval_dict.items()}
        scores['SOP Version'] = f"v{s['version']}"
        data.append(scores)
    
    df = pd.DataFrame(data)
    pd.plotting.parallel_coordinates(df, 'SOP Version', colormap=plt.get_cmap("viridis"), ax=ax2, axvlines_kwargs={"linewidth": 1, "color": "grey"})
    ax2.set_title('5D Performance Trade-offs on Pareto Front', fnotallow=16)
    ax2.grid(True, which='major', axis='y', linestyle='--', alpha=0.6)
    ax2.set_ylabel('Normalized Score', fnotallow=14)
    ax2.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=len(labels))
    plt.tight_layout()
    plt.show()

渲染結果直觀展示 v1 與 v2 在各維的差異:兩者在 Compliance、Ethics、Simplicity 上幾乎一致,只在 Rigor 與 Feasibility 上形成明顯權衡(典型“交叉”形態)。

可視化前沿并做出決策

我們已經從宏觀層面(進化、帕累托前沿)看到了系統如何自我改進。現在從微觀層面理解一次“高表現”運行的內部過程:agents 如何協作?瓶頸在哪里?多維得分如何轉化為可視化剖面?

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Understand the Workflow (Created by Fareed Khan)

計劃:

  • 對工作流加儀表(instrumentation):精確記錄每個 agent 的開始/結束/耗時。
  • 可視化執行時間線:用甘特圖(Gantt chart)呈現工作流,顯示并行與串行階段。
  • 用雷達圖(Radar Chart)對比 baseline 與 evolved SOP 的 5D 表現剖面。

理解認知工作流

使用 graph 的 ??.stream()?? 方法逐節點獲取事件,記錄時間戳:

import time
from collections import defaultdict


definvoke_with_timing(graph, sop, request):
    """Invokes the Guild graph while capturing start and end times for each node."""
    print(f"--- Instrumenting Graph Run for SOP: {sop.dict()} ---")
    
    timing_data = []
    start_times = defaultdict(float)
    
    graph_input = {"initial_request": request, "sop": sop}
    
    for event in graph.stream(graph_input, stream_mode="values"):
        node_name = list(event.keys())[0]
        end_time = time.time()
        
        if node_name notin start_times:
            start_times[node_name] = end_time - 0.1
        
        start_time = end_time - duration
        timing_data.append({
            "node": node_name,
            "start_time": start_time,
            "end_time": end_time,
            "duration": duration
        })
        start_times[node_name] = start_time
    overall_start_time = min(d['start_time'] for d in timing_data)
    for data in timing_data:
        data['start_time'] -= overall_start_time
        data['end_time'] -= overall_start_time
        
    final_state = event[list(event.keys())[-1]]
    return final_state, timing_data

對 v2 執行并捕獲時序數據(示例輸出顯示 ??execute_specialists?? 是主要耗時階段,符合預期)。

繪制甘特圖:

import matplotlib.pyplot as plt

def plot_gantt_chart(timing_data: List[Dict[str, Any]], title: str):
    """Plots a Gantt chart of the agentic workflow from timing data."""
    fig, ax = plt.subplots(figsize=(12, 4))
    
    labels = [d['node'] for d in timing_data]
    ax.barh(labels, [d['duration'] for d in timing_data], left=[d['start_time'] for d in timing_data], color='skyblue')
    
    ax.set_xlabel('Time (seconds)')
    ax.set_title(title, fnotallow=16)
    ax.grid(True, which='major', axis='x', linestyle='--', alpha=0.6)
    ax.invert_yaxis()
    plt.show()

甘特圖清晰展示了串行的頂層流程與內部并行機會,提示性能優化應聚焦 ??execute_specialists?? 階段。

使用雷達圖剖析輸出結果

用雷達圖對比 baseline v1 與 evolved v2 的 5D 剖面:

import pandas as pd


defplot_radar_chart(eval_results: List[Dict[str, Any]], labels: List[str]):
    """Creates a radar chart to compare the 5D performance of multiple SOPs."""
    
    categories = ['Rigor', 'Compliance', 'Ethics', 'Feasibility', 'Simplicity']
    num_vars = len(categories)
    angles = np.linspace(0, 2 * np.pi, num_vars, endpoint=False).tolist()
    angles += angles[:1]
    fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
    for i, result inenumerate(eval_results):
        values = [res.score for res in result.dict().values()]
        values += values[:1]
        ax.plot(angles, values, linewidth=2, linestyle='solid', label=labels[i])
        ax.fill(angles, values, alpha=0.25)

    ax.set_yticklabels([])
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(categories, fnotallow=12)
    ax.set_title('5D Performance Profile Comparison', size=20, color='blue', y=1.1)
    plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    plt.show()

圖中可見兩者在 Compliance、Ethics、Simplicity 上都很強;v1 在 Rigor 略優,而 v2 在 Feasibility 顯著優越,清晰呈現 trade-off。

自主策略

我們已設計、構建并演示了一套可自我改進的 agentic 系統。這不僅是一個解決方案,更是一套可擴展的基礎架構:分層代理設計、動態 SOP、多維評估、自動進化。這些原則打開了廣闊的未來空間:

  1. 持續運行進化循環:當前完成一代,未來可連續迭代數百代,以發現更豐富、更多樣的 Pareto Frontier(經過實戰檢驗的 SOP)。
  2. 將 Director 的推理蒸餾為更小的策略模型:基于成功變體的歷史進行訓練,用更快、更便宜的專用模型替換 70B Director,使進化更高效。
  3. 讓 AI Director 動態改變 Guild 的結構:根據試驗概念的需求,學習增刪專家(如新增“Biostatistician”),實現團隊層面的進化。
  4. 用實時 API 替換靜態 MIMIC-III:將??Patient Cohort Analyst?? 連接到安全的實時 EHR 系統,使可行性評估基于最新患者數據。
  5. 強化??SOP Architect?? 的進化操作符:引入“crossover”等機制,融合不同成功 SOP 的優勢,加速新策略發現。
  6. 融合人類專家反饋:將臨床科學家的評分接入評估回路,用專家判斷作為最終“獎勵信號”,引導系統趨向“技術最優 + 實踐卓越”的方案。

原文地址:???https://medium.com/gitconnected/building-a-self-improving-agentic-rag-system-f55003af44c4??

本文轉載自??PyTorch研習社??,作者:AI研究生

已于2025-11-24 00:11:22修改
收藏
回復
舉報
回復
相關推薦
国产 xxxx| 自拍偷拍视频在线| 最新在线中文字幕| 欧美成熟视频| 亚洲黄色在线观看| 九九热免费精品视频| a级片国产精品自在拍在线播放| 国产不卡免费视频| 国产精品激情av电影在线观看 | 精品国产www| 欧美日本久久| 国产一区二区三区在线免费观看| 少妇愉情理伦片bd| 在线成人视屏| 亚洲成人7777| 青青在线免费视频| 99中文字幕一区| 99久久婷婷国产| 91午夜在线播放| 中文字幕在线观看视频免费| 欧美精品aa| 日日狠狠久久偷偷四色综合免费| 国产男女猛烈无遮挡a片漫画| 亚洲一区二区小说| 欧美性猛交xxxx乱大交退制版| 久草免费福利在线| 污污网站在线观看| 国产精品久久久久久久久免费桃花| 精品国产乱码久久久久久郑州公司| 一区二区www| 日韩av在线播放中文字幕| 久久免费少妇高潮久久精品99| 五月婷婷综合激情网| 亚洲丁香日韩| 精品爽片免费看久久| 中文字幕欧美视频| 9999精品| 欧美精品xxxxbbbb| 91高清国产视频| se69色成人网wwwsex| 色综合久久六月婷婷中文字幕| 日韩精品综合在线| 91麻豆免费在线视频| 中文字幕一区二区三区蜜月| 少妇免费毛片久久久久久久久| 三级视频在线播放| 91麻豆文化传媒在线观看| 国产精品视频免费观看| 性一交一乱一精一晶| 国产一区二区三区四区在线观看| 国产一区二区在线免费视频| 亚洲天堂2021av| 免费视频最近日韩| 91精品久久久久久久久青青| 7777久久亚洲中文字幕| 韩国欧美国产1区| 91色在线视频| 成 人 黄 色 片 在线播放| 国产麻豆精品95视频| 91在线视频一区| 精品免费久久久| 成人激情av网| 久久精品aaaaaa毛片| 欧美色视频免费| 国产网站一区二区三区| 亚洲国产一区二区三区在线播| www.亚洲视频| 亚洲视频一二三区| 男女裸体影院高潮| 国产色播av在线| 在线视频国产一区| 成年人三级黄色片| 国产精品极品国产中出| 日韩精品久久久久| www亚洲色图| 久久精品亚洲欧美日韩精品中文字幕| 久久综合伊人77777蜜臀| 久久黄色小视频| 亚洲一区二区动漫| 国产欧美日韩综合精品| 99在线观看免费| 97se亚洲国产综合自在线观| 天堂精品视频| 欧美性爽视频| 91福利资源站| 免费黄视频在线观看| 日韩欧美美女在线观看| 中文字幕欧美日韩精品| 国产精品久久久久久久精| 一本久久知道综合久久| 国产欧美日韩精品在线观看| 国产麻豆免费视频| 91丝袜美腿高跟国产极品老师| 色综合影院在线观看| 亚洲91av| 欧美午夜电影网| 久久无码专区国产精品s| 国产调教一区二区三区| 欧美成人精品激情在线观看 | 欧美日韩综合在线| 伊人久久久久久久久| 精品美女在线视频| 国内成人精品一区| 国产免费高清视频| 久久久久久久性| www.18av.com| 国产亚洲精彩久久| 亚洲欧美国产精品| 久久精品欧美一区二区| 精品亚洲成a人在线观看| 久久久久一区二区| 青青在线视频| 欧美区在线观看| 国产传媒国产传媒| 99精品国产一区二区青青牛奶| 成人欧美一区二区三区在线 | 里番精品3d一二三区| 久久久精品国产| 中国老头性行为xxxx| 久久综合狠狠综合| 国产欧美日韩网站| 亚洲精品v亚洲精品v日韩精品| 色婷婷综合久久久久中文字幕1| 日本熟伦人妇xxxx| 成人午夜精品在线| 中文字幕の友人北条麻妃| av成人在线播放| 亚洲欧美精品一区| 欧美啪啪小视频| 成人动漫视频在线| 欧美日韩福利在线| 999久久精品| 欧美大片免费观看| 99热这里只有精品66| 国产精品的网站| 最新中文字幕免费视频| 免费视频一区三区| 欧美在线视频免费播放| 深夜福利在线观看直播| 亚洲va欧美va人人爽| 性高潮久久久久久| 亚洲天堂成人| 国产一区二区免费电影| 好久没做在线观看| 亚洲国产第一页| 国产精品成人国产乱| 成人看片黄a免费看在线| 国产肉体ⅹxxx137大胆| 99re8这里有精品热视频免费 | 亚洲精品福利在线观看| 尤物视频在线观看国产| 久久伊99综合婷婷久久伊| 欧美日韩亚洲第一| 精品一区二区三区在线| 国产欧美精品xxxx另类| 精品欧美色视频网站在线观看| 性xxxx视频| 国产在线中文字幕| 最新日韩av在线| 日韩视频在线观看一区二区三区| 91精品二区| 99中文视频在线| 97超碰在线免费| 亚洲老板91色精品久久| 免费黄色一级大片| 中文字幕亚洲在| 中文字幕无码毛片免费看| 很黄很黄激情成人| 欧美日韩国产精品一卡| 色猫猫成人app| 久久久精品一区二区| 蜜臀av中文字幕| 日韩欧美成人免费视频| 亚洲天堂精品一区| 福利视频网站一区二区三区| 亚洲熟妇无码一区二区三区| 精品国产91乱码一区二区三区四区 | 青春草免费在线视频| 日韩二区三区在线| 最近中文字幕在线观看视频| 亚洲欧美日韩国产另类专区| 天堂www中文在线资源| 日韩精品1区2区3区| 青少年xxxxx性开放hg| 乱亲女h秽乱长久久久| 国产日韩欧美成人| 中文字幕在线观看网站| 亚洲精品一区久久久久久| 国产精品无码在线播放| 精品久久久久久电影| 一级二级黄色片| 99re6这里只有精品视频在线观看| 992kp快乐看片永久免费网址| 欧美成人高清| 亚洲高清在线观看一区| 国产人妖ts一区二区| 91精品国产综合久久男男| 爱搞国产精品| 久久精品视频在线观看| 天堂a中文在线| 日韩欧美一级精品久久| 伊人久久久久久久久久久久| 亚洲一区二区精品久久av| 日本成人免费视频| 99久久久国产精品免费蜜臀| 亚洲高清视频免费| 日本亚洲三级在线| av动漫在线观看| 国内成人在线| 性欧美18一19内谢| 欧美特黄一级大片| 精品一区久久久久久| 玖玖玖视频精品| 国产精品视频午夜| 卡通欧美亚洲| 欧美一级成年大片在线观看| 日本一级理论片在线大全| 日韩一区二区欧美| av在线播放网| 亚洲视频精品在线| 午夜激情在线视频| 欧美精品一区二区在线观看| 国产欧美日韩成人| 欧美人xxxx| 中文字幕有码无码人妻av蜜桃| 欧美性生交xxxxx久久久| 日韩在线观看第一页| 亚洲一卡二卡三卡四卡五卡| 欧美黄色aaa| 成人免费一区二区三区视频 | 欧美理论在线观看| 亚洲国产精品一区二区三区| 99热这里只有精品1| 欧美一区二区三区在线看| 在线观看免费观看在线| 欧美性生活一区| 中文文字幕一区二区三三| 日本韩国精品一区二区在线观看| 日韩欧美亚洲一区二区三区| 午夜精品久久久久久| 不卡的免费av| 午夜视黄欧洲亚洲| 日韩激情在线播放| 欧美日韩亚洲一区二| 中文字幕视频网| 一本色道久久综合亚洲aⅴ蜜桃| 国产精品第5页| 色婷婷国产精品久久包臀 | 正在播放亚洲一区| 国产美女精品视频国产| 日韩一区二区在线播放| 国产不卡精品视频| 欧美成人精品福利| 天天干天天做天天操| 亚洲女人天堂成人av在线| 国产中文在线视频| www.亚洲一区| 久操av在线| 57pao成人国产永久免费| 欧美日韩免费观看视频| 国产精品视频yy9099| 白嫩亚洲一区二区三区| 国产精品青青草| 亚洲肉体裸体xxxx137| 日韩精品av一区二区三区| 欧美在线免费看视频| 99精品视频网站| 一区三区视频| 人妻丰满熟妇av无码区app| 美女网站一区二区| 久久久久99人妻一区二区三区| 99在线热播精品免费| 成人黄色a级片| 亚洲乱码日产精品bd | 欧美午夜不卡视频| 99国产精品久久久久久久成人 | 日本高清无吗v一区| 97caocao| 亚洲国产精彩中文乱码av| 国产在线自天天| 欧美风情在线观看| 成人性生活视频| 91视频88av| 亚洲人成精品久久久 | 欧美激情导航| 久久久久久久久久久妇女| 国产va亚洲va在线va| 美女视频一区二区三区| 久久人人爽人人人人片| 中文字幕一区二区三区在线播放| 久久久久成人网站| 欧美亚州韩日在线看免费版国语版| 国产av一区二区三区精品| 亚洲女人被黑人巨大进入al| 成人av免费| 国产精品爱久久久久久久| 深夜福利一区| 亚洲国产激情一区二区三区| 激情一区二区| 91热视频在线观看| 国产欧美精品日韩区二区麻豆天美| 欧美黄色一级网站| 欧美日韩一区国产| 青青久在线视频免费观看| 欧美激情aaaa| 成人黄色91| 香蕉久久免费影视| 国产乱码精品| 欧美激情一区二区三区p站| 1区2区3区欧美| 真实的国产乱xxxx在线91| 精品一区二区三区三区| 精品精品导航| 亚洲一区二区三区sesese| 成人久久综合| 国产日韩成人内射视频| av一区二区三区四区| 久久国产精品波多野结衣| 69堂亚洲精品首页| 蜜桃视频在线观看www社区| 国产精品久久久久77777| 一区二区美女| 播放灌醉水嫩大学生国内精品| 高清不卡在线观看| 黄色一级视频在线观看| 欧美一区二区视频网站| 免费在线观看黄色网| 成人激情av在线| 91欧美国产| 福利片一区二区三区| 亚洲国产精品精华液2区45| 无码人妻熟妇av又粗又大| 亚洲毛片在线观看| 全亚洲第一av番号网站| 欧美日韩精品久久久免费观看| 一本色道久久综合亚洲精品高清 | 午夜私人影院久久久久| 亚洲av永久无码国产精品久久| 久久99久久99精品中文字幕| 精品视频在线观看免费观看| 久久久一二三四| 国产精品一级在线| 欧美成人手机视频| 欧美精品一区二区三区久久久| 午夜小视频在线观看| 成人av蜜桃| 亚洲日韩成人| 少妇特黄一区二区三区| 色综合久久99| 91福利在线视频| 亚洲va国产va天堂va久久| 午夜欧美精品久久久久久久| 中文字幕99页| 福利微拍一区二区| 福利成人在线观看| 国产一区二区在线免费| 国产一区视频在线观看免费| 黄色国产在线视频| 欧美性猛交xxxx黑人| 成人午夜电影在线观看| 亚洲伊人第一页| 亚洲国产一区二区三区a毛片| 人妻少妇精品视频一区二区三区| 一本色道久久综合亚洲91| 日本电影在线观看网站| 91成人免费在线观看| 一区二区三区四区五区在线| 一道本在线观看| 5月丁香婷婷综合| 97蜜桃久久| 亚洲欧美日韩精品综合在线观看 | 爱爱的免费视频| 欧美图区在线视频| 91精品久久| 欧美一进一出视频| 国产精品自拍网站| 中文字幕超碰在线| xxxxx91麻豆| 亚洲精品456| 精产国品一二三区| 欧美性生交大片免网| av网址在线看| 奇米视频888战线精品播放| 国产美女娇喘av呻吟久久| 97超碰人人干| 久久伊人精品一区二区三区| 日韩三区视频| 国产传媒免费观看| 岛国av一区二区三区| 久久日韩视频| 久久精品日产第一区二区三区| 蜜臀av性久久久久av蜜臀妖精| 国产午夜视频在线| www高清在线视频日韩欧美| 希岛爱理av免费一区二区| 日本一二三四区视频| 一本大道久久a久久综合婷婷| 在线中文字幕-区二区三区四区 | yourporn在线观看中文站| 高清av免费一区中文字幕|