DeepSeek定制訓(xùn)練：微調(diào)與推理技術(shù)應(yīng)用

2025-04-30 09:19:32

本文檔描述了如何在MAC筆記本上對DeepSeek-R1-Distill-Llama-1.5BQwen架構(gòu)?進行高效微調(diào)，使用**?transformers進行數(shù)據(jù)處理，并結(jié)合LoRA技術(shù)進行模型微調(diào)，使用WandB監(jiān)控訓(xùn)練過程，ModelScope下載模型。

一. 前言介紹

本文內(nèi)容：

模型加載與預(yù)處理：詳細講解如何加載預(yù)訓(xùn)練模型、分詞器，并處理輸入數(shù)據(jù)集。
LoRA配置：介紹如何使用LoRA技術(shù)配置模型，并高效進行微調(diào)，節(jié)省計算資源。
訓(xùn)練過程：展示了如何配置訓(xùn)練參數(shù)，使用SFTTrainer進行訓(xùn)練，并通過WandB記錄訓(xùn)練日志。
模型保存與評估：如何保存微調(diào)后的模型，以及如何通過合適的評估集對模型進行驗證。
模型合并：展示了如何通過加權(quán)平均的方式合并多個模型權(quán)重，得到一個更強大的模型。

1.1 項目背景

本文檔描述了如何在MAC筆記本上對DeepSeek-R1-Distill-Llama-1.5BQwen架構(gòu) 進行高效微調(diào)，使用** transformers進行數(shù)據(jù)處理，并結(jié)合LoRA技術(shù)進行模型微調(diào)，使用WandB監(jiān)控訓(xùn)練過程，ModelScope下載模型。（訓(xùn)練數(shù)據(jù)量大約2w條左右）

由于為MAC筆記本本地訓(xùn)練無顯卡支持故而放棄（DeepSeek-R1-Distill-Qwen-7B Q wen）

下載的服務(wù)信息如下：

安裝服務(wù)	版本名稱	作用
Unsloth		用于數(shù)據(jù)處理和模型微調(diào)。
Transformers		Hugging Face 提供的模型庫，用于加載和微調(diào) DeepSeek-R1。
WandB		用于訓(xùn)練過程的實時監(jiān)控和可視化。
LoRA		用于微調(diào)的低秩適應(yīng)技術(shù)。
ModelScope		用于下載 DeepSeek-R1-8b 模型。
python3.11	Python 3.11	用于執(zhí)行 Python 腳本和訓(xùn)練任務(wù)。

1.2 LoRA和 QLoRA 簡介

以下是 LoRA 和 QLoRA 的區(qū)別表格：

特性	LoRA (Low-Rank Adaptation)	QLoRA (Quantized LoRA)
核心原理	通過低秩矩陣分解減少需要調(diào)整的參數(shù)量	在 LoRA 的基礎(chǔ)上結(jié)合量化技術(shù)，進一步減少存儲和計算需求
主要優(yōu)點	降低訓(xùn)練時需要調(diào)整的參數(shù)數(shù)量，提高微調(diào)效率	除了低秩矩陣，還通過量化減少內(nèi)存占用，適用于資源有限的環(huán)境
存儲需求	較低，但不如 QLoRA 節(jié)省內(nèi)存	顯著減少內(nèi)存使用，適合在內(nèi)存受限的設(shè)備上使用
計算效率	提高訓(xùn)練效率，減少計算資源消耗	量化后的低精度計算進一步提高了計算效率，降低了開銷
適用場景	計算資源有限但不需要極限壓縮的場景	內(nèi)存和計算資源極其有限的環(huán)境，特別是在邊緣設(shè)備上使用
適用硬件	適用于大多數(shù)硬件設(shè)備，尤其是高性能計算環(huán)境	特別適合內(nèi)存有限的硬件，如邊緣設(shè)備、低內(nèi)存服務(wù)器等

1.3 LLaMA 架構(gòu)和 Qwen 架構(gòu)

特性	LLaMA 架構(gòu)	Qwen 架構(gòu)
開發(fā)者	Meta（Facebook）	深度求索（DeepSeek）
設(shè)計目標	高效、輕量化	中文優(yōu)化、多語言支持
參數(shù)量	7B、13B、33B、65B 等	7B、14B 等
開源情況	開源	部分開源或未完全公開
適用場景	資源有限的環(huán)境	中文任務(wù)、多語言任務(wù)

LLaMA 架構(gòu)

全稱：Large Language Model Meta AI（LLaMA）
開發(fā)者：由 Meta（原 Facebook）開發(fā)。
特點：

a.高效性：LLaMA 旨在以較少的參數(shù)量實現(xiàn)高性能，專注于優(yōu)化計算效率。

b.輕量化：模型參數(shù)量相對較小（如 7B、13B、33B、65B），但通過高質(zhì)量數(shù)據(jù)和訓(xùn)練方法，性能接近甚至超越更大的模型。

c.開源：Meta 發(fā)布了 LLaMA 的權(quán)重和代碼，供研究社區(qū)使用。

應(yīng)用場景：

a.適合資源有限的環(huán)境，如本地部署或移動設(shè)備。

b.適用于各種 NLP 任務(wù)，尤其是在生成、問答、文本分類等任務(wù)中，具有較好的性能和效率。

Qwen 架構(gòu)

開發(fā)者：由中國的深度求索（DeepSeek）團隊開發(fā)。
特點：

a.定制化設(shè)計：Qwen 可能是針對中文或特定任務(wù)優(yōu)化的架構(gòu)，具體細節(jié)未完全公開。

b.多語言支持：Qwen 系列模型通常對中文有較好的支持，同時在英文和多語言任務(wù)上也有不錯的表現(xiàn)。

c.參數(shù)量靈活：Qwen 系列包括不同規(guī)模的模型（如 7B、14B 等），適合不同場景。

應(yīng)用場景：

Qwen 適用于文本生成、自動化內(nèi)容創(chuàng)作、對話系統(tǒng)、語音合成等任務(wù)。

二. 環(huán)境準備

2.1 Unsloth 安裝（顯卡版本-暫時不用）

Unsloth 是一個用于數(shù)據(jù)處理和模型微調(diào)的工具。您可以通過以下命令安裝：
MAC不試用，需要顯卡

##官網(wǎng)：https://github.com/unslothai/unsloth

#01 創(chuàng)建項目，并設(shè)置python虛擬環(huán)境，python3.11版本

#02 安裝 unsloth（cpu版本）
brew install llvm（Homebrew clang version 19.1.7）
echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

pip install torch
pip install numpy
pip install"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"



#03 版本檢查
python -c "import torch; print(torch.__version__)"
2.6.0

#04 引用
from unsloth import FastLanguageModel

安裝完成后，您可以使用 Unsloth 進行數(shù)據(jù)的預(yù)處理、加載和微調(diào)模型。

暫時不使用

#01 linux 服務(wù)建議使用docker


#02 拉取鏡像
docker pull modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-py310-torch2.3.1-1.22.2

#03 啟動

2.2 創(chuàng)建Python項目

#01 環(huán)境是python3.11

#02 項目目錄
Unsloth-DeepSeek-R1-8b/
├── data/                    # 存放訓(xùn)練數(shù)據(jù)、驗證數(shù)據(jù)等
│   ├── raw/                 # 原始數(shù)據(jù)
│   └── processed/           # 預(yù)處理后的數(shù)據(jù)
│
├── models/                  # 存放模型文件
│   ├── checkpoints/         # 存儲訓(xùn)練過程中的模型檢查點
│   └── final_model/         # 存儲最終微調(diào)后的模型
│
├── scripts/                 # 存放訓(xùn)練腳本、數(shù)據(jù)處理腳本等
│   ├── train.py             # 訓(xùn)練腳本
│   ├── data_preprocessing.py# 數(shù)據(jù)預(yù)處理腳本
│   └── evaluate.py          # 評估腳本
│
├── logs/                    # 存放訓(xùn)練日志文件
│   └── training_logs.txt    # 訓(xùn)練過程中的日志
│
├── wandb/                   # 存放 wandb 相關(guān)的配置和記錄
│   └── wandb_config.py      # wandb 配置文件
│
├── environment/             # 環(huán)境配置文件
│   ├── requirements.txt     # 項目的 Python 依賴
│   └── environment.yml      # 如果使用 Conda，可以創(chuàng)建一個環(huán)境配置文件
│
├── main.py                  # 主運行文件，啟動訓(xùn)練或其他任務(wù)
└── README.md                # 項目的描述文件，包含如何使用和運行的說明


#03 創(chuàng)建目錄
# 創(chuàng)建子目錄
mkdir -p data/raw
mkdir -p data/processed
mkdir -p models/checkpoints
mkdir -p models/final_model
mkdir -p scripts
mkdir -p logs
mkdir -p wandb
mkdir -p environment

# 創(chuàng)建文件
touch scripts/train.py
touch scripts/data_preprocessing.py
touch scripts/evaluate.py
touch logs/training_logs.txt
touch wandb/wandb_config.py
touch environment/requirements.txt
touch environment/environment.yml
touch main.py
touch README.md

2.3 python 依賴庫

#03 安裝即可
pip install torch==2.6.0 transformers datasets

#03 更新證書(后續(xù)如果有pip網(wǎng)站使用https 會驗證該證書)
/Applications/Python\ 3.11/Install\ Certificates.command

2.4 LoRA peft 安裝

LoRA 和 PEFT 的安裝：

LoRA 和 PEFT 是用于高效微調(diào)的技術(shù)。如果你想在 Mac 上使用這些技術(shù)來微調(diào) DeepSeek 模型，你需要安裝相關(guān)的依賴項。
PEFT 包含了 LoRA 的實現(xiàn)，并且它使得你能夠通過修改模型的一部分參數(shù)來進行高效微調(diào)，從而不需要調(diào)整整個模型的權(quán)重。

#01 安裝 peft
pip install peft

2.5 WandB 設(shè)置

WandB 是一個用于訓(xùn)練過程實時監(jiān)控和可視化的工具。您可以通過以下步驟設(shè)置 WandB：

注冊并登錄 WandB官網(wǎng)。
獲取您的 API 密鑰并配置環(huán)境變量：

#01 aipkey (本人谷歌郵箱)


#02 命令
pip install wandb
wandb login

#02  運行文件
import wandb  # 導(dǎo)入 wandb 庫，用于跟蹤和可視化實驗
import random  # 導(dǎo)入 random 庫，用于生成隨機數(shù)

# 開始一個新的 wandb 運行來跟蹤當前腳本
wandb.init(
    # 設(shè)置 wandb 項目，所有與該運行相關(guān)的數(shù)據(jù)將被記錄到這個項目中
    project="my-awesome-project",  # 項目名稱，你可以在 wandb 儀表盤中看到這個項目
    
    # 追蹤超參數(shù)和運行的元數(shù)據(jù)
    config={
        "learning_rate": 0.02,  # 設(shè)置學(xué)習(xí)率
        "architecture": "CNN",  # 模型架構(gòu)（這里是卷積神經(jīng)網(wǎng)絡(luò)）
        "dataset": "CIFAR-100",  # 使用的數(shù)據(jù)集（這里是 CIFAR-100 數(shù)據(jù)集）
        "epochs": 10,  # 訓(xùn)練的輪數(shù)
    }
)

# 模擬訓(xùn)練過程
epochs = 10# 總訓(xùn)練輪數(shù)
offset = random.random() / 5# 生成一個小的隨機偏移量，用于模擬訓(xùn)練過程中一些不確定性

# 開始訓(xùn)練循環(huán)，模擬 2 到 10 輪的訓(xùn)練過程
for epoch inrange(2, epochs):  # 從第二輪開始，到第 10 輪結(jié)束
    # 模擬準確率的變化，隨著 epoch 的增加，準確率逐漸提升
    acc = 1 - 2 ** -epoch - random.random() / epoch - offset
    
    # 模擬損失的變化，隨著 epoch 的增加，損失逐漸減少
    loss = 2 ** -epoch + random.random() / epoch + offset

    # 使用 wandb 記錄每一輪的準確率（acc）和損失值（loss）
    wandb.log({"acc": acc, "loss": loss})

# [可選] 結(jié)束 wandb 運行，確保數(shù)據(jù)被正確上傳并完成記錄
wandb.finish()

2.6 modelscope pull 模型

#01 安裝modelscope 
pip install modelscope

#02 下載模型文件
mkdir -p ./models/DeepSeek-R1-Distill-Llama-8B
mkdir -p ./models/DeepSeek-R1-Distill-Qwen-1.5B
mkdir -p ./models/DeepSeek-R1-Distill-Qwen-7B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local_dir ./models/DeepSeek-R1-Distill-Llama-8B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local_dir ./models/DeepSeek-R1-Distill-Qwen-1.5B

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir ./models/DeepSeek-R1-Distill-Qwen-7B



modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local_dir ./DeepSeek-R1-Distill-Llama-8B

2.7 測試模型使用

"""


訓(xùn)練前詢問問題：
  皮質(zhì)醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應(yīng)考慮哪種疾病？
  
訓(xùn)練后再次詢問：


scripts/test_inference.py

"""


import os
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# 獲取當前腳本的路徑
current_dir = os.path.dirname(__file__)

# 拼接模型和分詞器路徑
model_dir = os.path.join(current_dir, '..', 'models', 'DeepSeek-R1-Distill-Qwen-1.5B')

# 打印路徑確認
print(f"Model path: {model_dir}")

# 確保模型和分詞器的路徑存在
ifnot os.path.exists(model_dir):
    raise ValueError(f"Model directory does not exist at {model_dir}")
else:
    print("Model directory exists, proceeding with loading.")

# 加載模型和分詞器
print("Loading model and tokenizer...")
model = AutoModelForCausalLM.from_pretrained(model_dir)
tokenizer = AutoTokenizer.from_pretrained(model_dir)

# 打印模型和分詞器的配置信息
print(f"Model config: {model.config}")
print(f"Tokenizer config: {tokenizer}")

# 輸入中文文本
input_text = "皮質(zhì)醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應(yīng)考慮哪種疾病？"
print(f"User input: {input_text}")

# 結(jié)構(gòu)化的 prompt
prompt_style_chat = f"""請寫出一個恰當?shù)幕卮饋硗瓿僧斍皩υ捜蝿?wù)。

### Instruction:
你是一名助人為樂的助手。

### Question:
{input_text}

### Response:
<think>"""

# 使用分詞器處理輸入文本
inputs = tokenizer(prompt_style_chat, return_tensors="pt", padding=True, truncation=True, max_length=512)

# 打印 tokenized 輸入
print(f"Tokenized input: {inputs}")

# 打印輸入形狀
print(f"Input shape: {inputs['input_ids'].shape}")

# 打印模型的最大長度
print(f"Model max length: {model.config.max_position_embeddings}")

# 將模型移至正確的設(shè)備（使用 GPU 如果可用）
device = "cuda"if torch.cuda.is_available() else"cpu"
model.to(device)

# 打印設(shè)備信息
print(f"Using device: {device}")

# 打印分詞器的 pad_token_id
pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id isnotNoneelse model.config.pad_token_id
print(f"Using pad_token_id: {pad_token_id}")

# 生成模型輸出
print("Generating response...")
# 使用 max_new_tokens 來控制生成長度
with torch.no_grad():  # 禁用梯度計算，節(jié)省內(nèi)存
    try:
        print("Calling model.generate()...")
        outputs = model.generate(
            inputs['input_ids'].to(device),
            attention_mask=inputs['attention_mask'].to(device),
            max_new_tokens=1200,  # 設(shè)置最大生成的 token 數(shù)量
            temperature=1.0,
            top_p=0.9,
            pad_token_id=pad_token_id
        )

        print("Model.generate() completed.")
    except Exception as e:
        print(f"Error generating response: {e}")
        raise

# 打印生成的輸出 ID 和它們的形狀
print(f"Generated output IDs: {outputs}")
print(f"Shape of generated output: {outputs.shape}")

# 解碼生成的輸出文本
try:
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Generated response: {response}")
except Exception as e:
    print(f"Error decoding output: {e}")

問題回答

User input:皮質(zhì)醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應(yīng)考慮哪種疾病？
Tokenized input: {'input_ids':tensor([[151646,  14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949,
         105051,  88802,   3407,  14374,  29051,    510,  56568, 110124,  99262,
         103247,  99350,   9370, 110498,   3407,  14374,  15846,    510,  99888,
          99178, 103032, 107284,  99769, 101924,  18493,  99389, 101498,   6823,
             39, 100687, 109061, 100136,  26288, 114786,  29490, 101202,  72261,
         100180, 106555, 102360, 112758, 104248,   3837,  50511, 101118, 113195,
         101160,  26850,  14374,   5949,    510, 151648]]), 'attention_mask':tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Input shape:torch.Size([1,60])
Model max length:131072
Using device:cpu
Using pad_token_id:151643
Generatingresponse...
Callingmodel.generate()...
Model.generate()completed.

Generated response:請寫出一個恰當?shù)幕卮饋硗瓿僧斍皩υ捜蝿?wù)。

### Instruction:
你是一名助人為樂的助手。

### Question:
皮質(zhì)醇增多癥患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗陽性的情況下，應(yīng)考慮哪種疾病？

### Response:
<think>
好的，我現(xiàn)在需要仔細分析這個問題并給出一個合適的回答。首先，問題描述的是皮質(zhì)醇增多癥（PHT）患者在血漿ACTH明顯升高且大劑量地塞米松抑制試驗（SSDS）顯示陽性的情況下，應(yīng)考慮哪種疾病。

首先，我記得皮質(zhì)醇增多癥是由于皮質(zhì)醇分泌異常導(dǎo)致，通常由代謝紊亂或神經(jīng)退行性疾病引起，比如皮質(zhì)醇過激釋放癥、皮質(zhì)醇過激釋放性代謝綜合征等。通常，患者可能表現(xiàn)出皮質(zhì)醇水平升高，血漿ACTH顯著升高，這符合題意的第一個條件。

接下來，第二個條件是SSDS試驗陽性。SSDS試驗主要用于檢測皮質(zhì)醇釋放的細胞因子，比如PD-L1，這些因子在疾病早期有顯著的表觀變化。皮質(zhì)醇增多癥患者的皮質(zhì)醇釋放確實受阻，導(dǎo)致細胞因子釋放減少，這在SSDS中會被檢測出來，所以這種情況屬于皮質(zhì)醇增多癥。

綜合這兩個條件，患者的血漿ACTH升高和SSDS陽性，符合皮質(zhì)醇增多癥的特征。因此，這種情況下應(yīng)考慮的是皮質(zhì)醇增多癥。

我需要確保我沒有遺漏其他可能導(dǎo)致SSDS試驗陽性的情況。比如，是否有一些其他類型的疾病，比如胰島素素合成障礙或胰島素缺乏，也會影響皮質(zhì)醇釋放？不過，這些更可能是胰島素素合成障礙，而不是直接由皮質(zhì)醇釋放受阻引起的。皮質(zhì)醇增多癥通常是由于皮質(zhì)醇釋放異常，因此SSDS陽性更直接與皮質(zhì)醇釋放受阻相關(guān)。

此外，ACTH升高可能與皮質(zhì)醇增多癥不同，而更可能是由于激素分泌過量或其他激素調(diào)節(jié)問題。因此，ACTH升高的信號應(yīng)該更多指向皮質(zhì)醇增多癥。

綜上所述，這種情況下應(yīng)該考慮的疾病是皮質(zhì)醇增多癥。
</think>

應(yīng)考慮皮質(zhì)醇增多癥（PantoprazolidonePhenomenon）。

因為：

1.血漿ACTH顯著升高，符合皮質(zhì)醇增多癥的特征。
2.SSDS試驗陽性，表明皮質(zhì)醇釋放受阻，屬于皮質(zhì)醇增多癥的表現(xiàn)。

三. 訓(xùn)練數(shù)據(jù)數(shù)據(jù)

3.1 準備數(shù)據(jù)集

#01 我們使用COT格式 醫(yī)學(xué)領(lǐng)域 medical-o1-reasoning-SFT 數(shù)據(jù)集
https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

#02 b本地導(dǎo)入方式（）
from datasets import load_dataset
ds = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "zh")

Hugging face 數(shù)據(jù)集
modelscope

#01 使用modelscope 數(shù)據(jù)集 官網(wǎng)地址
https://www.modelscope.cn/datasets/YIRONGCHEN/PsyDTCorpus/files

#02 下載完整數(shù)據(jù)集repo
modelscope download --dataset YIRONGCHEN/PsyDTCorpus --local_dir ./dir


#03 下載單個文件到指定本地文件夾（以下載README.md到當前路徑下“dir”目錄為例）
modelscope download --dataset YIRONGCHEN/PsyDTCorpus README.md --local_dir ./dir

3.2 數(shù)據(jù)清洗

#01 用于對medical-o1-reasoning-SFT數(shù)據(jù)集進行修改，Complex_CoT列和Response列進行拼接，并加上文本結(jié)束標記：
defformatting_prompts_func(examples, EOS_TOKEN):
    """
    格式化數(shù)據(jù)集中的每個示例，使其符合訓(xùn)練的要求。

    Args:
        examples (dict): 數(shù)據(jù)集中的輸入示例
        EOS_TOKEN (str): 結(jié)束符

    Returns:
        dict: 格式化后的文本數(shù)據(jù)
    """
    train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
    Write a response that appropriately completes the request. 
    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

    ### Instruction:
    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
    Please answer the following medical question. 

    ### Question:
    {}

    ### Response:
    <think>
    {}
    </think>
    {}"""

    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    forinput, cot, output inzip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}



"""

問題（{}） 被嵌套到 ### Question: 下面，替換掉 {}。
推理過程（{}） 被嵌套到 <think></think> 標簽內(nèi)，替換掉第二個 {}。
答案（{}） 被嵌套到模板的最后，替換掉第三個 {}。
具體替換流程：
{} 第一個位置將會被每個樣本中的問題（examples["Question"]）替換。
{} 第二個位置將會被每個樣本中的推理過程（examples["Complex_CoT"]）替換。
{} 第三個位置將會被每個樣本中的答案（examples["Response"]）替換。
例如，如果輸入數(shù)據(jù)如下：

問題（Question）: "What is the cause of fever?"
推理過程（Complex_CoT）: "Fever is usually caused by an infection or inflammation. We need to identify the source."
答案（Response）: "The most common causes of fever are bacterial or viral infections."

"""

原數(shù)據(jù)格式

{
    "Question": [
        "What is the cause of headache?",
        "How do you treat a cold?"
    ],
    "Complex_CoT": [
        "The causes of headaches are numerous, including tension, dehydration, or sinus issues.",
        "Treating a cold typically involves rest, fluids, and over-the-counter medications for symptoms."
    ],
    "Response": [
        "A headache can be caused by stress, lack of sleep, or a sinus infection.",
        "For a cold, hydration and rest are key. Medications like ibuprofen can help with symptoms."
    ]
}

格式化后數(shù)據(jù)

{
    "text": [
        """Below is an instruction that describes a task, paired with an input that provides further context. 
        Write a response that appropriately completes the request. 
        Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

        ### Instruction:
        You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
        Please answer the following medical question. 

        ### Question:
        What is the cause of headache?

        ### Response:
        <think>
        The causes of headaches are numerous, including tension, dehydration, or sinus issues.
        </think>
        A headache can be caused by stress, lack of sleep, or a sinus infection. <|endoftext|>
        """,
        """Below is an instruction that describes a task, paired with an input that provides further context. 
        Write a response that appropriately completes the request. 
        Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

        ### Instruction:
        You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
        Please answer the following medical question. 

        ### Question:
        How do you treat a cold?

        ### Response:
        <think>
        Treating a cold typically involves rest, fluids, and over-the-counter medications for symptoms.
        </think>
        For a cold, hydration and rest are key. Medications like ibuprofen can help with symptoms. <|endoftext|>
        """
    ]
}

3.3 訓(xùn)練數(shù)據(jù)

setup_wandb: 配置并登錄到 wandb 進行實驗跟蹤和日志記錄。
set_paths: 設(shè)置根目錄、模型路徑、數(shù)據(jù)集路徑和保存微調(diào)模型的路徑。
load_model_and_tokenizer: 加載預(yù)訓(xùn)練模型和分詞器，獲取結(jié)束符。
formatting_prompts_func: 格式化數(shù)據(jù)集中的問題和回答，以便訓(xùn)練。
setup_lora: 配置并應(yīng)用LoRA（低秩適配器）到模型。
load_dataset_func: 加載數(shù)據(jù)集并進行切分，返回訓(xùn)練集和評估集。
setup_training_args: 設(shè)置訓(xùn)練參數(shù)，包括學(xué)習(xí)率、批處理大小、訓(xùn)練周期等。
train_model: 使用 SFTTrainer 進行模型訓(xùn)練。
save_model: 保存訓(xùn)練好的模型到指定路徑。

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from datasets import load_dataset
from peft import get_peft_model, LoraConfig
from trl import SFTTrainer  # 使用 SFTTrainer
import wandb
from config import setting

# 設(shè)置環(huán)境變量，禁用tokenizer的并行化
os.environ["TOKENIZERS_PARALLELISM"] = "false"


# 登錄wandb
defsetup_wandb():
    """
    登錄到wandb以便記錄訓(xùn)練過程中的日志和指標。
    """
    wandb.login()


# 設(shè)置路徑
defset_paths():
    """
    設(shè)置項目根目錄、模型路徑、數(shù)據(jù)集路徑和最終模型保存路徑。

    Returns:
        model_dir (str): 模型文件路徑
        dataset_path (str): 數(shù)據(jù)集路徑
        final_model_dir (str): 微調(diào)后模型的保存路徑
    """
    root_dir = setting.root_dir  # 項目根路徑
    model_dir = os.path.join(root_dir, 'models', 'DeepSeek-R1-Distill-Qwen-1.5B')  # 模型文件路徑
    dataset_path = os.path.join(root_dir, 'data', 'medical-o1-reasoning-SFT')  # 數(shù)據(jù)集路徑
    final_model_dir = os.path.join(root_dir, 'models', 'final_model')  # 高效微調(diào)后模型保存路徑
    print(f'設(shè)置模型路徑：{model_dir} | 數(shù)據(jù)集位置：{dataset_path}')
    return model_dir, dataset_path, final_model_dir


# 加載模型和分詞器
defload_model_and_tokenizer(model_dir):
    """
    加載預(yù)訓(xùn)練模型和對應(yīng)的分詞器，并獲取結(jié)束符（EOS_TOKEN）。

    Args:
        model_dir (str): 模型的文件路徑

    Returns:
        model (AutoModelForCausalLM): 加載的模型
        tokenizer (AutoTokenizer): 加載的分詞器
        EOS_TOKEN (str): 模型的結(jié)束符（如果沒有，使用默認值）
    """
    print("加載分詞器：Loading model and tokenizer...")
    model = AutoModelForCausalLM.from_pretrained(model_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_dir)

    EOS_TOKEN = tokenizer.eos_token
    if EOS_TOKEN isNone:
        EOS_TOKEN = "<|endoftext|>"

    print(f'結(jié)束符：{EOS_TOKEN}')
    return model, tokenizer, EOS_TOKEN


# 格式化訓(xùn)練數(shù)據(jù)
defformatting_prompts_func(examples, EOS_TOKEN):
    """
    格式化數(shù)據(jù)集中的每個示例，使其符合訓(xùn)練的要求。

    Args:
        examples (dict): 數(shù)據(jù)集中的輸入示例
        EOS_TOKEN (str): 結(jié)束符

    Returns:
        dict: 格式化后的文本數(shù)據(jù)
    """
    train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
    Write a response that appropriately completes the request. 
    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

    ### Instruction:
    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
    Please answer the following medical question. 

    ### Question:
    {}

    ### Response:
    <think>
    {}
    </think>
    {}"""

    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    forinput, cot, output inzip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}


# 設(shè)置LoRA配置
defsetup_lora(model):
    """
    設(shè)置LoRA（低秩適配器）配置，并將其應(yīng)用到模型。

    Args:
        model (AutoModelForCausalLM): 加載的模型

    Returns:
        model (AutoModelForCausalLM): 應(yīng)用LoRA后的模型
    """
    print("設(shè)置LoRA: Setting up LoRA configuration...")
    lora_config = LoraConfig(
        r=8,  # adapter的秩
        lora_alpha=32,  # 縮放因子
        lora_dropout=0.1,  # LoRA層的dropout
        bias="none",  # LoRA的偏置項
    )
    return get_peft_model(model, lora_config)


# 加載數(shù)據(jù)集
defload_dataset_func(dataset_path, train_size=100):
    """
    從指定路徑加載數(shù)據(jù)集，訓(xùn)練集大小為 train_size，評估集為訓(xùn)練集的10%，但至少為1。
    """
    print(f"從 {dataset_path} 加載數(shù)據(jù)集...")
    # 加載數(shù)據(jù)集
    dataset = load_dataset(dataset_path, "en", split="train", trust_remote_code=True)

    # 計算評估集大小
    eval_size = max(1, int(train_size * 0.1))  # 評估集大小是訓(xùn)練集的10%，但至少為1

    # 切分數(shù)據(jù)集
    train_dataset = dataset.select(range(train_size))  # 使用前 train_size 條作為訓(xùn)練集
    eval_dataset = dataset.select(range(train_size, train_size + eval_size))  # 剩余部分作為評估集

    print(f"訓(xùn)練集大小: {len(train_dataset)}, 評估集大小: {len(eval_dataset)}")
    return train_dataset, eval_dataset


# 配置訓(xùn)練參數(shù)
defsetup_training_args(final_model_dir, enable_evaluation=True):
    """
    設(shè)置訓(xùn)練參數(shù)，包括輸出目錄、學(xué)習(xí)率、批處理大小等，并根據(jù)參數(shù)控制是否啟用評估。

    Args:
        final_model_dir (str): 微調(diào)后模型保存的路徑
        enable_evaluation (bool): 是否啟用評估。默認為True，啟用評估；為False時禁用評估。

    Returns:
        training_args (TrainingArguments): 訓(xùn)練參數(shù)
    """
    # 根據(jù)是否啟用評估設(shè)置 evaluation_strategy
    evaluation_strategy = "epoch"if enable_evaluation else"no"

    training_args = TrainingArguments(
        output_dir=final_model_dir,
        evaluation_strategy=evaluation_strategy,  # 控制評估策略
        learning_rate=5e-5,
        per_device_train_batch_size=2,  # 適當減少批處理大小（根據(jù)M3 Pro的內(nèi)存限制）
        gradient_accumulation_steps=4,  # 使用梯度累積，模擬更大的批量
        num_train_epochs=3,  # 訓(xùn)練3個周期
        report_to="wandb",  # 使用wandb進行訓(xùn)練日志記錄
        weight_decay=0.01,
        logging_dir=os.path.join(setting.root_dir, 'logs'),
        logging_steps=50,  # 減少日志記錄頻率
        save_steps=500,  # 增加模型保存的步數(shù)頻率，減少頻繁保存
        save_total_limit=2,  # 保存最多2個模型
        dataloader_num_workers=4,  # 設(shè)置數(shù)據(jù)加載器的并行數(shù)（根據(jù)需要調(diào)整）
    )
    return training_args



# 訓(xùn)練模型
deftrain_model(model, training_args, dataset, eval_dataset, tokenizer, enable_evaluation=True):
    """
    使用SFTTrainer進行模型訓(xùn)練。

    Args:
        model (AutoModelForCausalLM): 需要訓(xùn)練的模型
        training_args (TrainingArguments): 訓(xùn)練參數(shù)
        dataset (Dataset): 用于訓(xùn)練的數(shù)據(jù)集
        eval_dataset (Dataset): 用于評估的數(shù)據(jù)集
        tokenizer (AutoTokenizer): 分詞器
        enable_evaluation (bool): 是否進行評估

    Returns:
        trainer (SFTTrainer): 訓(xùn)練器實例
    """
    # 如果啟用了評估，傳遞評估集
    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=dataset,
        eval_dataset=eval_dataset if enable_evaluation elseNone,  # 根據(jù)參數(shù)決定是否傳遞評估集
        tokenizer=tokenizer,
        data_collator=None,  # 可以選擇合適的data collator
    )
    trainer.train()
    return trainer


# 保存模型
defsave_model(trainer, final_model_dir):
    """
    保存訓(xùn)練后的模型到指定目錄。

    Args:
        trainer (SFTTrainer): 訓(xùn)練器實例
        final_model_dir (str): 模型保存路徑
    """
    print("Saving model...")
    trainer.save_model(final_model_dir)



defmerge_models(models, weights, device="cpu"):
    """
    合并多個模型的權(quán)重（加權(quán)平均）。

    Args:
        models (list): 模型列表
        weights (list): 權(quán)重列表，權(quán)重數(shù)量與模型數(shù)量一致
        device (str): 設(shè)備，可以是 "cuda" 或 "cpu"

    Returns:
        merged_model (nn.Module): 合并后的模型
    """
    # 確保模型數(shù)量與權(quán)重數(shù)量一致
    assertlen(models) == len(weights), "模型數(shù)量與權(quán)重數(shù)量不一致"

    # 將所有模型加載到相同的設(shè)備
    for i inrange(len(models)):
        models[i] = models[i].to(device)

    # 獲取第一個模型的狀態(tài)字典
    merged_state_dict = models[0].state_dict()

    # 對每一層的權(quán)重進行加權(quán)平均
    for key in merged_state_dict.keys():
        merged_state_dict[key] = torch.zeros_like(merged_state_dict[key])
        for model, weight inzip(models, weights):
            merged_state_dict[key] += model.state_dict()[key] * weight

    # 創(chuàng)建一個新的模型并加載合并后的權(quán)重
    merged_model = models[0].__class__.from_pretrained(models[0].config)
    merged_model.load_state_dict(merged_state_dict)
    return merged_model


# 主函數(shù)
defmain():
    """
    主函數(shù)，執(zhí)行整個訓(xùn)練流程：設(shè)置路徑、加載模型、訓(xùn)練并保存模型。

    參數(shù)設(shè)置：
            enable_evaluation = False  # 設(shè)置為False以禁用評估 如果性能慢可以設(shè)置 False

    加載數(shù)據(jù)集：
        train_size=10 設(shè)置數(shù)據(jù)集大小，評估集是數(shù)據(jù)集百分之10（如果小于1 則等于1 ）
        train_dataset, eval_dataset = load_dataset_func(dataset_path, train_size=10)


    """
    setup_wandb()  # 登錄wandb
    model_dir, dataset_path, final_model_dir = set_paths()  # 設(shè)置路徑

    model, tokenizer, EOS_TOKEN = load_model_and_tokenizer(model_dir)  # 加載模型和分詞器

    train_dataset, eval_dataset = load_dataset_func(dataset_path, train_size=5)  # 加載數(shù)據(jù)集
    train_dataset = train_dataset.map(lambda examples: formatting_prompts_func(examples, EOS_TOKEN), batched=True)  # 格式化數(shù)據(jù)集
    eval_dataset = eval_dataset.map(lambda examples: formatting_prompts_func(examples, EOS_TOKEN), batched=True)  # 格式化評估集
    print(train_dataset["text"][0])  # 打印格式化后的數(shù)據(jù)

    model = setup_lora(model)  # 配置LoRA
    # 設(shè)置是否開啟評估
    enable_evaluation = True# 設(shè)置為False以禁用評估
    training_args = setup_training_args(final_model_dir,enable_evaluation)  # 配置訓(xùn)練參數(shù)
    trainer = train_model(model, training_args, train_dataset, eval_dataset, tokenizer, enable_evaluation)  # 開始訓(xùn)練

    save_model(trainer, final_model_dir)  # 保存模型
    wandb.finish()  # 完成wandb記錄




# 執(zhí)行主函數(shù)
if __name__ == "__main__":
    main()

3.4 訓(xùn)練模型并保存

"""
保存在本地 models/final_model 路徑下

"""

defsave_model(trainer, final_model_dir):
    """
    保存訓(xùn)練后的模型到指定目錄。

    Args:
        trainer (SFTTrainer): 訓(xùn)練器實例
        final_model_dir (str): 模型保存路徑
    """
    print("Saving model...")
    trainer.save_model(final_model_dir)

3.5 合并模型文件

#01 執(zhí)行即可
new_model_local = "DeepSeek-R1-Medical-COT-Tiny"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

3.6 評估和監(jiān)控訓(xùn)練過程

評估（eval/）相關(guān)信息：

eval/runtime 18.3908: 評估過程總共耗時18.39秒。
eval/samples_per_second 0.054: 每秒處理的樣本數(shù)為0.054，表示評估的速度較慢。
eval/steps_per_second 0.054: 每秒進行評估步數(shù)為0.054，說明每個評估步驟的時間消耗較大。

訓(xùn)練（train/）相關(guān)信息：

train/epoch 0: 當前訓(xùn)練輪次是第0輪。
train/global_step 0: 當前全局步驟為0，表示尚未進行任何訓(xùn)練步驟。
train_loss 14435.36663: 當前訓(xùn)練的損失為14435.37，表明模型的表現(xiàn)尚不理想，通常需要更多的訓(xùn)練來降低損失。
train/runtime 251.7582: 訓(xùn)練總時間為251.76秒。
train/samples_per_second 0.06: 每秒處理的訓(xùn)練樣本數(shù)為0.06，訓(xùn)練的速度較慢。
train/steps_per_second 0.012: 每秒進行的訓(xùn)練步數(shù)為0.012，表示每個訓(xùn)練步驟消耗的時間較長。

#02 詳細日志
wandb: ?? View project at https://wandb.ai/z15119911990-beijing/huggingface
wandb: ?? View run at https://wandb.ai/z15119911990-beijing/huggingface/runs/mgrko2jv
  0%|          | 0/3 [00:00<?, ?it/s]
{'eval_runtime': 14.8693, 'eval_samples_per_second': 0.067, 'eval_steps_per_second': 0.067, 'epoch': 0}
                                     
  0%|          | 0/3 [00:30<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 1461.94it/s]
                                               
                                     
{'eval_runtime': 21.2073, 'eval_samples_per_second': 0.047, 'eval_steps_per_second': 0.047, 'epoch': 0}
  0%|          | 0/3 [02:11<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 33.69it/s]
                                             
                                     
  0%|          | 0/3 [04:02<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 334.66it/s]
                                              {'eval_runtime': 18.3908, 'eval_samples_per_second': 0.054, 'eval_steps_per_second': 0.054, 'epoch': 0}
{'train_runtime': 251.7582, 'train_samples_per_second': 0.06, 'train_steps_per_second': 0.012, 'train_loss': 14435.3666305542, 'epoch': 0}
  0%|          | 0/3 [04:10<?, ?it/s]
wandb:                                                                                
wandb: 
wandb: Run history:
wandb:            eval/runtime ▁█▅
wandb: eval/samples_per_second █▁▃
wandb:   eval/steps_per_second █▁▃
wandb:             train/epoch ▁▁▁▁
wandb:       train/global_step ▁▁▁▁
wandb: 
wandb: Run summary:
wandb:             eval/runtime 18.3908
wandb:  eval/samples_per_second 0.054
wandb:    eval/steps_per_second 0.054
wandb:               total_flos 43804457687040.0
wandb:              train/epoch 0
wandb:        train/global_step 0
wandb:               train_loss 14435.36663
wandb:            train_runtime 251.7582
wandb: train_samples_per_second 0.06
wandb:   train_steps_per_second 0.012
wandb: 
wandb: ?? View run /Users/ningcaichen/Documents/02-python相關(guān)文檔/01-AI系列/LoRA-DeepSeek-R1/models/final_model at: https://wandb.ai/z15119911990-beijing/huggingface/runs/mgrko2jv
wandb: ?? View project at: https://wandb.ai/z15119911990-beijing/huggingface
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250212_133457-mgrko2jv/logs

責任編輯：武曉燕來源：海邊的拾遺者

DeepSeek 微調(diào)定制訓(xùn)練

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看