DeepSeek定制訓練：微調與推理技術應用

2025-04-30 09:19:32

本文檔描述了如何在MAC筆記本上對DeepSeek-R1-Distill-Llama-1.5BQwen架構?進行高效微調，使用**?transformers進行數據處理，并結合LoRA技術進行模型微調，使用WandB監控訓練過程，ModelScope下載模型。

一. 前言介紹

本文內容：

模型加載與預處理：詳細講解如何加載預訓練模型、分詞器，并處理輸入數據集。
LoRA配置：介紹如何使用LoRA技術配置模型，并高效進行微調，節省計算資源。
訓練過程：展示了如何配置訓練參數，使用SFTTrainer進行訓練，并通過WandB記錄訓練日志。
模型保存與評估：如何保存微調后的模型，以及如何通過合適的評估集對模型進行驗證。
模型合并：展示了如何通過加權平均的方式合并多個模型權重，得到一個更強大的模型。

1.1 項目背景

本文檔描述了如何在MAC筆記本上對DeepSeek-R1-Distill-Llama-1.5BQwen架構進行高效微調，使用** transformers進行數據處理，并結合LoRA技術進行模型微調，使用WandB監控訓練過程，ModelScope下載模型。（訓練數據量大約2w條左右）

由于為MAC筆記本本地訓練無顯卡支持故而放棄（DeepSeek-R1-Distill-Qwen-7B Q wen）

下載的服務信息如下：

安裝服務	版本名稱	作用
Unsloth		用于數據處理和模型微調。
Transformers		Hugging Face 提供的模型庫，用于加載和微調 DeepSeek-R1。
WandB		用于訓練過程的實時監控和可視化。
LoRA		用于微調的低秩適應技術。
ModelScope		用于下載 DeepSeek-R1-8b 模型。
python3.11	Python 3.11	用于執行 Python 腳本和訓練任務。

1.2 LoRA和 QLoRA 簡介

以下是 LoRA 和 QLoRA 的區別表格：

特性	LoRA (Low-Rank Adaptation)	QLoRA (Quantized LoRA)
核心原理	通過低秩矩陣分解減少需要調整的參數量	在 LoRA 的基礎上結合量化技術，進一步減少存儲和計算需求
主要優點	降低訓練時需要調整的參數數量，提高微調效率	除了低秩矩陣，還通過量化減少內存占用，適用于資源有限的環境
存儲需求	較低，但不如 QLoRA 節省內存	顯著減少內存使用，適合在內存受限的設備上使用
計算效率	提高訓練效率，減少計算資源消耗	量化后的低精度計算進一步提高了計算效率，降低了開銷
適用場景	計算資源有限但不需要極限壓縮的場景	內存和計算資源極其有限的環境，特別是在邊緣設備上使用
適用硬件	適用于大多數硬件設備，尤其是高性能計算環境	特別適合內存有限的硬件，如邊緣設備、低內存服務器等

1.3 LLaMA 架構和 Qwen 架構

特性	LLaMA 架構	Qwen 架構
開發者	Meta（Facebook）	深度求索（DeepSeek）
設計目標	高效、輕量化	中文優化、多語言支持
參數量	7B、13B、33B、65B 等	7B、14B 等
開源情況	開源	部分開源或未完全公開
適用場景	資源有限的環境	中文任務、多語言任務

LLaMA 架構

全稱：Large Language Model Meta AI（LLaMA）
開發者：由 Meta（原 Facebook）開發。
特點：

a.高效性：LLaMA 旨在以較少的參數量實現高性能，專注于優化計算效率。

b.輕量化：模型參數量相對較小（如 7B、13B、33B、65B），但通過高質量數據和訓練方法，性能接近甚至超越更大的模型。

c.開源：Meta 發布了 LLaMA 的權重和代碼，供研究社區使用。

應用場景：

a.適合資源有限的環境，如本地部署或移動設備。

b.適用于各種 NLP 任務，尤其是在生成、問答、文本分類等任務中，具有較好的性能和效率。

Qwen 架構

開發者：由中國的深度求索（DeepSeek）團隊開發。
特點：

a.定制化設計：Qwen 可能是針對中文或特定任務優化的架構，具體細節未完全公開。

b.多語言支持：Qwen 系列模型通常對中文有較好的支持，同時在英文和多語言任務上也有不錯的表現。

c.參數量靈活：Qwen 系列包括不同規模的模型（如 7B、14B 等），適合不同場景。

應用場景：

Qwen 適用于文本生成、自動化內容創作、對話系統、語音合成等任務。

二. 環境準備

2.1 Unsloth 安裝（顯卡版本-暫時不用）

Unsloth 是一個用于數據處理和模型微調的工具。您可以通過以下命令安裝：
MAC不試用，需要顯卡

##官網：https://github.com/unslothai/unsloth

#01 創建項目，并設置python虛擬環境，python3.11版本

#02 安裝 unsloth（cpu版本）
brew install llvm（Homebrew clang version 19.1.7）
echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

pip install torch
pip install numpy
pip install"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"



#03 版本檢查
python -c "import torch; print(torch.__version__)"
2.6.0

#04 引用
from unsloth import FastLanguageModel

安裝完成后，您可以使用 Unsloth 進行數據的預處理、加載和微調模型。

暫時不使用

#01 linux 服務建議使用docker


#02 拉取鏡像
docker pull modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-py310-torch2.3.1-1.22.2

#03 啟動

2.2 創建Python項目

#01 環境是python3.11

#02 項目目錄
Unsloth-DeepSeek-R1-8b/
├── data/                    # 存放訓練數據、驗證數據等
│   ├── raw/                 # 原始數據
│   └── processed/           # 預處理后的數據
│
├── models/                  # 存放模型文件
│   ├── checkpoints/         # 存儲訓練過程中的模型檢查點
│   └── final_model/         # 存儲最終微調后的模型
│
├── scripts/                 # 存放訓練腳本、數據處理腳本等
│   ├── train.py             # 訓練腳本
│   ├── data_preprocessing.py# 數據預處理腳本
│   └── evaluate.py          # 評估腳本
│
├── logs/                    # 存放訓練日志文件
│   └── training_logs.txt    # 訓練過程中的日志
│
├── wandb/                   # 存放 wandb 相關的配置和記錄
│   └── wandb_config.py      # wandb 配置文件
│
├── environment/             # 環境配置文件
│   ├── requirements.txt     # 項目的 Python 依賴
│   └── environment.yml      # 如果使用 Conda，可以創建一個環境配置文件
│
├── main.py                  # 主運行文件，啟動訓練或其他任務
└── README.md                # 項目的描述文件，包含如何使用和運行的說明


#03 創建目錄
# 創建子目錄
mkdir -p data/raw
mkdir -p data/processed
mkdir -p models/checkpoints
mkdir -p models/final_model
mkdir -p scripts
mkdir -p logs
mkdir -p wandb
mkdir -p environment

# 創建文件
touch scripts/train.py
touch scripts/data_preprocessing.py
touch scripts/evaluate.py
touch logs/training_logs.txt
touch wandb/wandb_config.py
touch environment/requirements.txt
touch environment/environment.yml
touch main.py
touch README.md

2.3 python 依賴庫

#03 安裝即可
pip install torch==2.6.0 transformers datasets

#03 更新證書(后續如果有pip網站使用https 會驗證該證書)
/Applications/Python\ 3.11/Install\ Certificates.command

2.4 LoRA peft 安裝

LoRA 和 PEFT 的安裝：

LoRA 和 PEFT 是用于高效微調的技術。如果你想在 Mac 上使用這些技術來微調 DeepSeek 模型，你需要安裝相關的依賴項。
PEFT 包含了 LoRA 的實現，并且它使得你能夠通過修改模型的一部分參數來進行高效微調，從而不需要調整整個模型的權重。

#01 安裝 peft
pip install peft

2.5 WandB 設置

WandB 是一個用于訓練過程實時監控和可視化的工具。您可以通過以下步驟設置 WandB：

注冊并登錄 WandB官網。
獲取您的 API 密鑰并配置環境變量：

#01 aipkey (本人谷歌郵箱)


#02 命令
pip install wandb
wandb login

#02  運行文件
import wandb  # 導入 wandb 庫，用于跟蹤和可視化實驗
import random  # 導入 random 庫，用于生成隨機數

# 開始一個新的 wandb 運行來跟蹤當前腳本
wandb.init(
    # 設置 wandb 項目，所有與該運行相關的數據將被記錄到這個項目中
    project="my-awesome-project",  # 項目名稱，你可以在 wandb 儀表盤中看到這個項目
    
    # 追蹤超參數和運行的元數據
    config={
        "learning_rate": 0.02,  # 設置學習率
        "architecture": "CNN",  # 模型架構（這里是卷積神經網絡）
        "dataset": "CIFAR-100",  # 使用的數據集（這里是 CIFAR-100 數據集）
        "epochs": 10,  # 訓練的輪數
    }
)

# 模擬訓練過程
epochs = 10# 總訓練輪數
offset = random.random() / 5# 生成一個小的隨機偏移量，用于模擬訓練過程中一些不確定性

# 開始訓練循環，模擬 2 到 10 輪的訓練過程
for epoch inrange(2, epochs):  # 從第二輪開始，到第 10 輪結束
    # 模擬準確率的變化，隨著 epoch 的增加，準確率逐漸提升
    acc = 1 - 2 ** -epoch - random.random() / epoch - offset
    
    # 模擬損失的變化，隨著 epoch 的增加，損失逐漸減少
    loss = 2 ** -epoch + random.random() / epoch + offset

    # 使用 wandb 記錄每一輪的準確率（acc）和損失值（loss）
    wandb.log({"acc": acc, "loss": loss})

# [可選] 結束 wandb 運行，確保數據被正確上傳并完成記錄
wandb.finish()

2.6 modelscope pull 模型

#01 安裝modelscope 
pip install modelscope

#02 下載模型文件
mkdir -p ./models/DeepSeek-R1-Distill-Llama-8B
mkdir -p ./models/DeepSeek-R1-Distill-Qwen-1.5B
mkdir -p ./models/DeepSeek-R1-Distill-Qwen-7B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local_dir ./models/DeepSeek-R1-Distill-Llama-8B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local_dir ./models/DeepSeek-R1-Distill-Qwen-1.5B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir ./models/DeepSeek-R1-Distill-Qwen-7B



modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local_dir ./DeepSeek-R1-Distill-Llama-8B

2.7 測試模型使用

"""


訓練前詢問問題：
  皮質醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應考慮哪種疾病？
  
訓練后再次詢問：


scripts/test_inference.py

"""


import os
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# 獲取當前腳本的路徑
current_dir = os.path.dirname(__file__)

# 拼接模型和分詞器路徑
model_dir = os.path.join(current_dir, '..', 'models', 'DeepSeek-R1-Distill-Qwen-1.5B')

# 打印路徑確認
print(f"Model path: {model_dir}")

# 確保模型和分詞器的路徑存在
ifnot os.path.exists(model_dir):
    raise ValueError(f"Model directory does not exist at {model_dir}")
else:
    print("Model directory exists, proceeding with loading.")

# 加載模型和分詞器
print("Loading model and tokenizer...")
model = AutoModelForCausalLM.from_pretrained(model_dir)
tokenizer = AutoTokenizer.from_pretrained(model_dir)

# 打印模型和分詞器的配置信息
print(f"Model config: {model.config}")
print(f"Tokenizer config: {tokenizer}")

# 輸入中文文本
input_text = "皮質醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應考慮哪種疾病？"
print(f"User input: {input_text}")

# 結構化的 prompt
prompt_style_chat = f"""請寫出一個恰當的回答來完成當前對話任務。

### Instruction:
你是一名助人為樂的助手。

### Question:
{input_text}

### Response:
<think>"""

# 使用分詞器處理輸入文本
inputs = tokenizer(prompt_style_chat, return_tensors="pt", padding=True, truncation=True, max_length=512)

# 打印 tokenized 輸入
print(f"Tokenized input: {inputs}")

# 打印輸入形狀
print(f"Input shape: {inputs['input_ids'].shape}")

# 打印模型的最大長度
print(f"Model max length: {model.config.max_position_embeddings}")

# 將模型移至正確的設備（使用 GPU 如果可用）
device = "cuda"if torch.cuda.is_available() else"cpu"
model.to(device)

# 打印設備信息
print(f"Using device: {device}")

# 打印分詞器的 pad_token_id
pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id isnotNoneelse model.config.pad_token_id
print(f"Using pad_token_id: {pad_token_id}")

# 生成模型輸出
print("Generating response...")
# 使用 max_new_tokens 來控制生成長度
with torch.no_grad():  # 禁用梯度計算，節省內存
    try:
        print("Calling model.generate()...")
        outputs = model.generate(
            inputs['input_ids'].to(device),
            attention_mask=inputs['attention_mask'].to(device),
            max_new_tokens=1200,  # 設置最大生成的 token 數量
            temperature=1.0,
            top_p=0.9,
            pad_token_id=pad_token_id
        )

        print("Model.generate() completed.")
    except Exception as e:
        print(f"Error generating response: {e}")
        raise

# 打印生成的輸出 ID 和它們的形狀
print(f"Generated output IDs: {outputs}")
print(f"Shape of generated output: {outputs.shape}")

# 解碼生成的輸出文本
try:
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Generated response: {response}")
except Exception as e:
    print(f"Error decoding output: {e}")

問題回答

User input:皮質醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應考慮哪種疾病？
Tokenized input: {'input_ids':tensor([[151646,  14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949,
         105051,  88802,   3407,  14374,  29051,    510,  56568, 110124,  99262,
         103247,  99350,   9370, 110498,   3407,  14374,  15846,    510,  99888,
          99178, 103032, 107284,  99769, 101924,  18493,  99389, 101498,   6823,
             39, 100687, 109061, 100136,  26288, 114786,  29490, 101202,  72261,
         100180, 106555, 102360, 112758, 104248,   3837,  50511, 101118, 113195,
         101160,  26850,  14374,   5949,    510, 151648]]), 'attention_mask':tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Input shape:torch.Size([1,60])
Model max length:131072
Using device:cpu
Using pad_token_id:151643
Generatingresponse...
Callingmodel.generate()...
Model.generate()completed.

Generated response:請寫出一個恰當的回答來完成當前對話任務。

### Instruction:
你是一名助人為樂的助手。

### Question:
皮質醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應考慮哪種疾病？

### Response:
<think>
好的，我現在需要仔細分析這個問題并給出一個合適的回答。首先，問題描述的是皮質醇增多癥（PHT）患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗（SSDS）顯示陽性的情況下，應考慮哪種疾病。

首先，我記得皮質醇增多癥是由于皮質醇分泌異常導致，通常由代謝紊亂或神經退行性疾病引起，比如皮質醇過激釋放癥、皮質醇過激釋放性代謝綜合征等。通常，患者可能表現出皮質醇水平升高，血漿ACTH顯著升高，這符合題意的第一個條件。

接下來，第二個條件是SSDS試驗陽性。SSDS試驗主要用于檢測皮質醇釋放的細胞因子，比如PD-L1，這些因子在疾病早期有顯著的表觀變化。皮質醇增多癥患者的皮質醇釋放確實受阻，導致細胞因子釋放減少，這在SSDS中會被檢測出來，所以這種情況屬于皮質醇增多癥。

綜合這兩個條件，患者的血漿ACTH升高和SSDS陽性，符合皮質醇增多癥的特征。因此，這種情況下應考慮的是皮質醇增多癥。

我需要確保我沒有遺漏其他可能導致SSDS試驗陽性的情況。比如，是否有一些其他類型的疾病，比如胰島素素合成障礙或胰島素缺乏，也會影響皮質醇釋放？不過，這些更可能是胰島素素合成障礙，而不是直接由皮質醇釋放受阻引起的。皮質醇增多癥通常是由于皮質醇釋放異常，因此SSDS陽性更直接與皮質醇釋放受阻相關。

此外，ACTH升高可能與皮質醇增多癥不同，而更可能是由于激素分泌過量或其他激素調節問題。因此，ACTH升高的信號應該更多指向皮質醇增多癥。

綜上所述，這種情況下應該考慮的疾病是皮質醇增多癥。
</think>

應考慮皮質醇增多癥（PantoprazolidonePhenomenon）。

因為：

1.血漿ACTH顯著升高，符合皮質醇增多癥的特征。
2.SSDS試驗陽性，表明皮質醇釋放受阻，屬于皮質醇增多癥的表現。

三. 訓練數據數據

3.1 準備數據集

#01 我們使用COT格式 醫學領域 medical-o1-reasoning-SFT 數據集
https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

#02 b本地導入方式（）
from datasets import load_dataset
ds = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "zh")

Hugging face 數據集
modelscope

#01 使用modelscope 數據集 官網地址
https://www.modelscope.cn/datasets/YIRONGCHEN/PsyDTCorpus/files

#02 下載完整數據集repo
modelscope download --dataset YIRONGCHEN/PsyDTCorpus --local_dir ./dir


#03 下載單個文件到指定本地文件夾（以下載README.md到當前路徑下“dir”目錄為例）
modelscope download --dataset YIRONGCHEN/PsyDTCorpus README.md --local_dir ./dir

3.2 數據清洗

#01 用于對medical-o1-reasoning-SFT數據集進行修改，Complex_CoT列和Response列進行拼接，并加上文本結束標記：
defformatting_prompts_func(examples, EOS_TOKEN):
    """
    格式化數據集中的每個示例，使其符合訓練的要求。

    Args:
        examples (dict): 數據集中的輸入示例
        EOS_TOKEN (str): 結束符

    Returns:
        dict: 格式化后的文本數據
    """
    train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
    Write a response that appropriately completes the request. 
    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

    ### Instruction:
    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
    Please answer the following medical question. 

    ### Question:
    {}

    ### Response:
    <think>
    {}
    </think>
    {}"""

    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    forinput, cot, output inzip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}



"""

問題（{}） 被嵌套到 ### Question: 下面，替換掉 {}。
推理過程（{}） 被嵌套到 <think></think> 標簽內，替換掉第二個 {}。
答案（{}） 被嵌套到模板的最后，替換掉第三個 {}。
具體替換流程：
{} 第一個位置將會被每個樣本中的問題（examples["Question"]）替換。
{} 第二個位置將會被每個樣本中的推理過程（examples["Complex_CoT"]）替換。
{} 第三個位置將會被每個樣本中的答案（examples["Response"]）替換。
例如，如果輸入數據如下：

問題（Question）: "What is the cause of fever?"
推理過程（Complex_CoT）: "Fever is usually caused by an infection or inflammation. We need to identify the source."
答案（Response）: "The most common causes of fever are bacterial or viral infections."

"""

原數據格式

{
    "Question": [
        "What is the cause of headache?",
        "How do you treat a cold?"
    ],
    "Complex_CoT": [
        "The causes of headaches are numerous, including tension, dehydration, or sinus issues.",
        "Treating a cold typically involves rest, fluids, and over-the-counter medications for symptoms."
    ],
    "Response": [
        "A headache can be caused by stress, lack of sleep, or a sinus infection.",
        "For a cold, hydration and rest are key. Medications like ibuprofen can help with symptoms."
    ]
}

格式化后數據

{
    "text": [
        """Below is an instruction that describes a task, paired with an input that provides further context. 
        Write a response that appropriately completes the request. 
        Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

        ### Instruction:
        You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
        Please answer the following medical question. 

        ### Question:
        What is the cause of headache?

        ### Response:
        <think>
        The causes of headaches are numerous, including tension, dehydration, or sinus issues.
        </think>
        A headache can be caused by stress, lack of sleep, or a sinus infection. <|endoftext|>
        """,
        """Below is an instruction that describes a task, paired with an input that provides further context. 
        Write a response that appropriately completes the request. 
        Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

        ### Instruction:
        You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
        Please answer the following medical question. 

        ### Question:
        How do you treat a cold?

        ### Response:
        <think>
        Treating a cold typically involves rest, fluids, and over-the-counter medications for symptoms.
        </think>
        For a cold, hydration and rest are key. Medications like ibuprofen can help with symptoms. <|endoftext|>
        """
    ]
}

3.3 訓練數據

setup_wandb: 配置并登錄到 wandb 進行實驗跟蹤和日志記錄。
set_paths: 設置根目錄、模型路徑、數據集路徑和保存微調模型的路徑。
load_model_and_tokenizer: 加載預訓練模型和分詞器，獲取結束符。
formatting_prompts_func: 格式化數據集中的問題和回答，以便訓練。
setup_lora: 配置并應用LoRA（低秩適配器）到模型。
load_dataset_func: 加載數據集并進行切分，返回訓練集和評估集。
setup_training_args: 設置訓練參數，包括學習率、批處理大小、訓練周期等。
train_model: 使用 SFTTrainer 進行模型訓練。
save_model: 保存訓練好的模型到指定路徑。

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from datasets import load_dataset
from peft import get_peft_model, LoraConfig
from trl import SFTTrainer  # 使用 SFTTrainer
import wandb
from config import setting

# 設置環境變量，禁用tokenizer的并行化
os.environ["TOKENIZERS_PARALLELISM"] = "false"


# 登錄wandb
defsetup_wandb():
    """
    登錄到wandb以便記錄訓練過程中的日志和指標。
    """
    wandb.login()


# 設置路徑
defset_paths():
    """
    設置項目根目錄、模型路徑、數據集路徑和最終模型保存路徑。

    Returns:
        model_dir (str): 模型文件路徑
        dataset_path (str): 數據集路徑
        final_model_dir (str): 微調后模型的保存路徑
    """
    root_dir = setting.root_dir  # 項目根路徑
    model_dir = os.path.join(root_dir, 'models', 'DeepSeek-R1-Distill-Qwen-1.5B')  # 模型文件路徑
    dataset_path = os.path.join(root_dir, 'data', 'medical-o1-reasoning-SFT')  # 數據集路徑
    final_model_dir = os.path.join(root_dir, 'models', 'final_model')  # 高效微調后模型保存路徑
    print(f'設置模型路徑：{model_dir} | 數據集位置：{dataset_path}')
    return model_dir, dataset_path, final_model_dir


# 加載模型和分詞器
defload_model_and_tokenizer(model_dir):
    """
    加載預訓練模型和對應的分詞器，并獲取結束符（EOS_TOKEN）。

    Args:
        model_dir (str): 模型的文件路徑

    Returns:
        model (AutoModelForCausalLM): 加載的模型
        tokenizer (AutoTokenizer): 加載的分詞器
        EOS_TOKEN (str): 模型的結束符（如果沒有，使用默認值）
    """
    print("加載分詞器：Loading model and tokenizer...")
    model = AutoModelForCausalLM.from_pretrained(model_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_dir)

    EOS_TOKEN = tokenizer.eos_token
    if EOS_TOKEN isNone:
        EOS_TOKEN = "<|endoftext|>"

    print(f'結束符：{EOS_TOKEN}')
    return model, tokenizer, EOS_TOKEN


# 格式化訓練數據
defformatting_prompts_func(examples, EOS_TOKEN):
    """
    格式化數據集中的每個示例，使其符合訓練的要求。

    Args:
        examples (dict): 數據集中的輸入示例
        EOS_TOKEN (str): 結束符

    Returns:
        dict: 格式化后的文本數據
    """
    train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
    Write a response that appropriately completes the request. 
    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

    ### Instruction:
    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
    Please answer the following medical question. 

    ### Question:
    {}

    ### Response:
    <think>
    {}
    </think>
    {}"""

    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    forinput, cot, output inzip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}


# 設置LoRA配置
defsetup_lora(model):
    """
    設置LoRA（低秩適配器）配置，并將其應用到模型。

    Args:
        model (AutoModelForCausalLM): 加載的模型

    Returns:
        model (AutoModelForCausalLM): 應用LoRA后的模型
    """
    print("設置LoRA: Setting up LoRA configuration...")
    lora_config = LoraConfig(
        r=8,  # adapter的秩
        lora_alpha=32,  # 縮放因子
        lora_dropout=0.1,  # LoRA層的dropout
        bias="none",  # LoRA的偏置項
    )
    return get_peft_model(model, lora_config)


# 加載數據集
defload_dataset_func(dataset_path, train_size=100):
    """
    從指定路徑加載數據集，訓練集大小為 train_size，評估集為訓練集的10%，但至少為1。
    """
    print(f"從 {dataset_path} 加載數據集...")
    # 加載數據集
    dataset = load_dataset(dataset_path, "en", split="train", trust_remote_code=True)

    # 計算評估集大小
    eval_size = max(1, int(train_size * 0.1))  # 評估集大小是訓練集的10%，但至少為1

    # 切分數據集
    train_dataset = dataset.select(range(train_size))  # 使用前 train_size 條作為訓練集
    eval_dataset = dataset.select(range(train_size, train_size + eval_size))  # 剩余部分作為評估集

    print(f"訓練集大小: {len(train_dataset)}, 評估集大小: {len(eval_dataset)}")
    return train_dataset, eval_dataset


# 配置訓練參數
defsetup_training_args(final_model_dir, enable_evaluation=True):
    """
    設置訓練參數，包括輸出目錄、學習率、批處理大小等，并根據參數控制是否啟用評估。

    Args:
        final_model_dir (str): 微調后模型保存的路徑
        enable_evaluation (bool): 是否啟用評估。默認為True，啟用評估；為False時禁用評估。

    Returns:
        training_args (TrainingArguments): 訓練參數
    """
    # 根據是否啟用評估設置 evaluation_strategy
    evaluation_strategy = "epoch"if enable_evaluation else"no"

    training_args = TrainingArguments(
        output_dir=final_model_dir,
        evaluation_strategy=evaluation_strategy,  # 控制評估策略
        learning_rate=5e-5,
        per_device_train_batch_size=2,  # 適當減少批處理大小（根據M3 Pro的內存限制）
        gradient_accumulation_steps=4,  # 使用梯度累積，模擬更大的批量
        num_train_epochs=3,  # 訓練3個周期
        report_to="wandb",  # 使用wandb進行訓練日志記錄
        weight_decay=0.01,
        logging_dir=os.path.join(setting.root_dir, 'logs'),
        logging_steps=50,  # 減少日志記錄頻率
        save_steps=500,  # 增加模型保存的步數頻率，減少頻繁保存
        save_total_limit=2,  # 保存最多2個模型
        dataloader_num_workers=4,  # 設置數據加載器的并行數（根據需要調整）
    )
    return training_args



# 訓練模型
deftrain_model(model, training_args, dataset, eval_dataset, tokenizer, enable_evaluation=True):
    """
    使用SFTTrainer進行模型訓練。

    Args:
        model (AutoModelForCausalLM): 需要訓練的模型
        training_args (TrainingArguments): 訓練參數
        dataset (Dataset): 用于訓練的數據集
        eval_dataset (Dataset): 用于評估的數據集
        tokenizer (AutoTokenizer): 分詞器
        enable_evaluation (bool): 是否進行評估

    Returns:
        trainer (SFTTrainer): 訓練器實例
    """
    # 如果啟用了評估，傳遞評估集
    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=dataset,
        eval_dataset=eval_dataset if enable_evaluation elseNone,  # 根據參數決定是否傳遞評估集
        tokenizer=tokenizer,
        data_collator=None,  # 可以選擇合適的data collator
    )
    trainer.train()
    return trainer


# 保存模型
defsave_model(trainer, final_model_dir):
    """
    保存訓練后的模型到指定目錄。

    Args:
        trainer (SFTTrainer): 訓練器實例
        final_model_dir (str): 模型保存路徑
    """
    print("Saving model...")
    trainer.save_model(final_model_dir)



defmerge_models(models, weights, device="cpu"):
    """
    合并多個模型的權重（加權平均）。

    Args:
        models (list): 模型列表
        weights (list): 權重列表，權重數量與模型數量一致
        device (str): 設備，可以是 "cuda" 或 "cpu"

    Returns:
        merged_model (nn.Module): 合并后的模型
    """
    # 確保模型數量與權重數量一致
    assertlen(models) == len(weights), "模型數量與權重數量不一致"

    # 將所有模型加載到相同的設備
    for i inrange(len(models)):
        models[i] = models[i].to(device)

    # 獲取第一個模型的狀態字典
    merged_state_dict = models[0].state_dict()

    # 對每一層的權重進行加權平均
    for key in merged_state_dict.keys():
        merged_state_dict[key] = torch.zeros_like(merged_state_dict[key])
        for model, weight inzip(models, weights):
            merged_state_dict[key] += model.state_dict()[key] * weight

    # 創建一個新的模型并加載合并后的權重
    merged_model = models[0].__class__.from_pretrained(models[0].config)
    merged_model.load_state_dict(merged_state_dict)
    return merged_model


# 主函數
defmain():
    """
    主函數，執行整個訓練流程：設置路徑、加載模型、訓練并保存模型。

    參數設置：
            enable_evaluation = False  # 設置為False以禁用評估 如果性能慢可以設置 False

    加載數據集：
        train_size=10 設置數據集大小，評估集是數據集百分之10（如果小于1 則等于1 ）
        train_dataset, eval_dataset = load_dataset_func(dataset_path, train_size=10)


    """
    setup_wandb()  # 登錄wandb
    model_dir, dataset_path, final_model_dir = set_paths()  # 設置路徑

    model, tokenizer, EOS_TOKEN = load_model_and_tokenizer(model_dir)  # 加載模型和分詞器

    train_dataset, eval_dataset = load_dataset_func(dataset_path, train_size=5)  # 加載數據集
    train_dataset = train_dataset.map(lambda examples: formatting_prompts_func(examples, EOS_TOKEN), batched=True)  # 格式化數據集
    eval_dataset = eval_dataset.map(lambda examples: formatting_prompts_func(examples, EOS_TOKEN), batched=True)  # 格式化評估集
    print(train_dataset["text"][0])  # 打印格式化后的數據

    model = setup_lora(model)  # 配置LoRA
    # 設置是否開啟評估
    enable_evaluation = True# 設置為False以禁用評估
    training_args = setup_training_args(final_model_dir,enable_evaluation)  # 配置訓練參數
    trainer = train_model(model, training_args, train_dataset, eval_dataset, tokenizer, enable_evaluation)  # 開始訓練

    save_model(trainer, final_model_dir)  # 保存模型
    wandb.finish()  # 完成wandb記錄




# 執行主函數
if __name__ == "__main__":
    main()

3.4 訓練模型并保存

"""
保存在本地 models/final_model 路徑下

"""

defsave_model(trainer, final_model_dir):
    """
    保存訓練后的模型到指定目錄。

    Args:
        trainer (SFTTrainer): 訓練器實例
        final_model_dir (str): 模型保存路徑
    """
    print("Saving model...")
    trainer.save_model(final_model_dir)

3.5 合并模型文件

#01 執行即可
new_model_local = "DeepSeek-R1-Medical-COT-Tiny"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

3.6 評估和監控訓練過程

評估（eval/）相關信息：

eval/runtime 18.3908: 評估過程總共耗時18.39秒。
eval/samples_per_second 0.054: 每秒處理的樣本數為0.054，表示評估的速度較慢。
eval/steps_per_second 0.054: 每秒進行評估步數為0.054，說明每個評估步驟的時間消耗較大。

訓練（train/）相關信息：

train/epoch 0: 當前訓練輪次是第0輪。
train/global_step 0: 當前全局步驟為0，表示尚未進行任何訓練步驟。
train_loss 14435.36663: 當前訓練的損失為14435.37，表明模型的表現尚不理想，通常需要更多的訓練來降低損失。
train/runtime 251.7582: 訓練總時間為251.76秒。
train/samples_per_second 0.06: 每秒處理的訓練樣本數為0.06，訓練的速度較慢。
train/steps_per_second 0.012: 每秒進行的訓練步數為0.012，表示每個訓練步驟消耗的時間較長。

#02 詳細日志
wandb: ?? View project at https://wandb.ai/z15119911990-beijing/huggingface
wandb: ?? View run at https://wandb.ai/z15119911990-beijing/huggingface/runs/mgrko2jv
  0%|          | 0/3 [00:00<?, ?it/s]
{'eval_runtime': 14.8693, 'eval_samples_per_second': 0.067, 'eval_steps_per_second': 0.067, 'epoch': 0}
                                     
  0%|          | 0/3 [00:30<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 1461.94it/s]
                                               
                                     
{'eval_runtime': 21.2073, 'eval_samples_per_second': 0.047, 'eval_steps_per_second': 0.047, 'epoch': 0}
  0%|          | 0/3 [02:11<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 33.69it/s]
                                             
                                     
  0%|          | 0/3 [04:02<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 334.66it/s]
                                              {'eval_runtime': 18.3908, 'eval_samples_per_second': 0.054, 'eval_steps_per_second': 0.054, 'epoch': 0}
{'train_runtime': 251.7582, 'train_samples_per_second': 0.06, 'train_steps_per_second': 0.012, 'train_loss': 14435.3666305542, 'epoch': 0}
  0%|          | 0/3 [04:10<?, ?it/s]
wandb:                                                                                
wandb: 
wandb: Run history:
wandb:            eval/runtime ▁█▅
wandb: eval/samples_per_second █▁▃
wandb:   eval/steps_per_second █▁▃
wandb:             train/epoch ▁▁▁▁
wandb:       train/global_step ▁▁▁▁
wandb: 
wandb: Run summary:
wandb:             eval/runtime 18.3908
wandb:  eval/samples_per_second 0.054
wandb:    eval/steps_per_second 0.054
wandb:               total_flos 43804457687040.0
wandb:              train/epoch 0
wandb:        train/global_step 0
wandb:               train_loss 14435.36663
wandb:            train_runtime 251.7582
wandb: train_samples_per_second 0.06
wandb:   train_steps_per_second 0.012
wandb: 
wandb: ?? View run /Users/ningcaichen/Documents/02-python相關文檔/01-AI系列/LoRA-DeepSeek-R1/models/final_model at: https://wandb.ai/z15119911990-beijing/huggingface/runs/mgrko2jv
wandb: ?? View project at: https://wandb.ai/z15119911990-beijing/huggingface
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250212_133457-mgrko2jv/logs

責任編輯：武曉燕來源：海邊的拾遺者

DeepSeek 微調定制訓練