成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

鴻蒙開發者社區

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考信創認證華為認證廠商認證 IT技術 PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發者社區

51CTO技術棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發者社區訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業版APP

鴻蒙開發者社區視頻號

51CTO軟考題庫

賬號設置退出

Unsloth 微調 Qwen3 實戰教程來了！

2025-05-14 01:00:00

Qwen3–30B-A3B僅需17.5GB VRAM即可運行。unsloth的Dynamic 2.0量化技術保證了高精度，同時支持原生128K上下文長度。Qwen3模型具有思考模式和非思考模式，適用于不同復雜度的任務。

unsloth微調Qwen3模型提供顯著優勢：訓練速度提高2倍，VRAM使用減少70%，支持8倍長的上下文。Qwen3–30B-A3B僅需17.5GB VRAM即可運行。unsloth的Dynamic 2.0量化技術保證了高精度，同時支持原生128K上下文長度。Qwen3模型具有思考模式和非思考模式，適用于不同復雜度的任務。微調后的模型可用于法律文檔分析、定制知識庫構建等領域，能夠處理特定領域查詢并保持上下文，優于純檢索系統。unsloth支持4bit/16bit的QLoRA/LoRA微調，適用于2018年后的NVIDIA GPU，為資源有限環境下的模型定制提供了高效解決方案。

圖片

Qwen3模型微調的主要場景

unsloth支持對Qwen3模型進行微調，可以應用于以下場景：

法律文檔輔助 — 在法律文本（合同、案例法、法規）上進行微調，用于合同分析、案例法研究或合規支持
定制知識庫 — 將專業領域的知識直接嵌入到模型中，使其能夠處理特定領域的查詢和文檔總結

Qwen3模型本身具有兩種工作模式，使微調后的模型更加靈活：

思考模式(Thinking Mode)：模型會在給出最終答案前進行逐步推理，適合需要深度思考的復雜問題
非思考模式(Non-Thinking Mode)：模型提供快速、近乎即時的回答，適合簡單問題

使用unsloth微調Qwen3的主要優勢

unsloth使Qwen3(8B)微調速度提高2倍，VRAM使用減少70%，并且比所有使用Flash Attention 2的環境支持長8倍的上下文長度。使用unsloth，Qwen3–30B-A3B模型可以舒適地在僅17.5GB VRAM的環境中運行。

unsloth為Qwen3提供了Dynamic 2.0量化方法，在5-shot MMLU和KL散度基準測試中提供最佳性能。這意味著可以運行和微調量化后的Qwen3 LLM，同時保持最小的精度損失。unsloth還上傳了支持原生128K上下文長度的Qwen3版本。

unsloth支持多種微調技術，包括4bit和16bit的QLoRA/LoRA微調。它通過手動推導所有計算密集型數學步驟并手寫GPU核心，在不更改硬件的情況下使訓練速度更快。

技術特點與支持

unsloth提供了多種設置選項來優化微調過程：

max_seq_length = 2048：控制上下文長度。雖然Qwen3支持40960，但建議測試時使用2048。unsloth能夠實現8倍長的上下文微調
load_in_4bit = True：啟用4位量化，減少微調時內存使用量至原來的1/4，適用于16GB GPU

unsloth上傳了所有版本的Qwen3，包括Dynamic 2.0 GGUF、動態4位等格式到Hugging Face。此外，unsloth還支持包括30B-A3B和235B-A22B在內的Qwen3 MOE模型。

unsloth的技術支持包括：

支持2018年以后的NVIDIA GPU，最低CUDA能力要求7.0
支持各種Transformer風格的模型，包括Phi-4推理、Mixtral、MOE、Cohere等
支持任何訓練算法，比如帶VLM的GRPO

實際應用優勢

與純檢索系統相比，微調提供了幾個顯著優勢：

微調幾乎可以做到檢索增強生成(RAG)能做的一切，但反之則不然
在微調過程中，外部知識直接嵌入到模型中，使模型能夠處理特定領域查詢并在不依賴外部檢索系統的情況下保持上下文
即使在同時使用微調和RAG的混合設置中，微調后的模型也提供了可靠的后備方案

在特定領域，如醫療保健領域的視覺問答(VQA)任務中，微調使模型更好地理解領域特定的細微差別，提高其提供準確和上下文相關響應的能力。微調后的模型在精確度和召回率上表現明顯優于零樣本預測。

為獲得最佳結果，建議策劃結構良好的數據集，理想情況下是問答對形式。這可以增強學習、理解和響應準確性。

使用unsloth微調Qwen3模型可以實現更快的訓練速度、更低的內存需求和更長的上下文支持，同時保持高精度。這使得即使在資源有限的環境中，也能夠將強大的Qwen3模型適配到特定領域的應用場景中。

完整微調代碼

**微調后的模型獲得的能力:**
1. 雙模式操作能力: - 普通對話模式: 適用于日常聊天場景
 - 思考模式(Thinking Mode): 用于解決需要推理的問題2. 數學推理能力: 能夠解決數學問題并展示詳細的推理過程，如示例中的"解方程(x + 2)^2 = 0"
3. 對話能力保持: 同時保持了自然對話的能力，能夠進行流暢的多輪對話微調使模型成為一個"雙重人格"的助手，既能進行普通閑聊，又能在需要時切換到更嚴謹的思考模式來解決復雜問題，特別是數學問題。### 安裝
"""# Commented out IPython magic to ensure Python compatibility.
# %%capture
# import os
# if "COLAB_" not in "".join(os.environ.keys()):
#     # 如果不是在Google Colab環境中運行，則簡單安裝unsloth庫
#     !pip install unsloth
# else:
#     # 在Google Colab環境中運行時的特殊安裝流程
#     # 首先安裝所有依賴庫，但不處理它們的依賴關系(--no-deps參數)
#     !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
#     # 安裝常用的自然語言處理和模型托管工具
#     !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
#     # 最后安裝unsloth庫本身，不處理依賴(避免版本沖突)
#     !pip install --no-deps unsloth
#"""### Unsloth"""from unsloth import FastLanguageModel
import torchfourbit_models = [
    "unsloth/Qwen3-1.7B-unsloth-bnb-4bit", # Qwen 14B 2x faster
    "unsloth/Qwen3-4B-unsloth-bnb-4bit",
    "unsloth/Qwen3-8B-unsloth-bnb-4bit",
    "unsloth/Qwen3-14B-unsloth-bnb-4bit",
    "unsloth/Qwen3-32B-unsloth-bnb-4bit",    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/Phi-4",
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit"# [NEW] We support TTS models!
] # More models at <https://huggingface.co/unsloth>model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 2048,   # Context length - can be longer, but uses more memory
    load_in_4bit = True,     # 4bit uses much less memory
    load_in_8bit = False,    # A bit more accurate, uses 2x memory
    full_finetuning = False, # We have full finetuning now!
    token = "",      # use one if using gated models
)"""We now add LoRA adapters so we only need to update 1 to 10% of all parameters!"""# 添加LoRA適配器
# 通過LoRA技術，只需要更新1-10%的參數即可實現有效微調
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,           # # LoRA秩，建議值為8,16,32,64,128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,  # LoRA alpha值，建議設為rank或rank*2
    lora_dropout = 0, # LoRA dropout，0值經過優化
    bias = "none",    # 偏置設置，"none"已優化
    # [新特性] "unsloth"模式減少30%顯存，可適應2倍大的批次大小
    use_gradient_checkpointing = "unsloth", #梯度檢查點，用于長上下文
    random_state = 3407,  # 隨機種子
    use_rslora = False,   # 是否使用rank stabilized LoRA
    loftq_config = None,  # LoftQ配置
)"""<a name="Data"></a>
### Data Prep
Qwen3 has both reasoning and a non reasoning mode. So, we should use 2 datasets:1. We use the [Open Math Reasoning]() dataset which was used to win the [AIMO](<https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/leaderboard>) (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and whicht got > 95% accuracy.2. We also leverage [Maxime Labonne's FineTome-100k](<https://huggingface.co/datasets/mlabonne/FineTome-100k>) dataset in ShareGPT style. But we need to convert it to HuggingFace's normal multiturn format as well.
"""# 數據準備
# Qwen3同時具有推理和非推理模式，因此使用兩種數據集：
# 1. OpenMathReasoning數據集 - 用于數學推理能力
# 2. FineTome-100k數據集 - 用于一般對話能力
from datasets import load_dataset
# 加載數學推理數據集
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini", split = "cot",token="")
# 加載對話數據集
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split = "train",token="")"""Let's see the structure of both datasets:"""# 查看推理數據集結構
reasoning_dataset# 查看非推理數據集結構
non_reasoning_dataset"""We now convert the reasoning dataset into conversational format:"""# 將推理數據集轉換為對話格式
# 將數學問題和解決方案轉換為用戶-助手對話格式
# 參數:
#     examples: 批量樣本，包含問題和解決方案
# 返回:
#     包含對話格式的字典def generate_conversation(examples):
    problems  = examples["problem"]
    solutions = examples["generated_solution"]
    conversations = []
    for problem, solution in zip(problems, solutions):
        conversations.append([
            {"role" : "user",      "content" : problem},
            {"role" : "assistant", "content" : solution},
        ])
    return { "conversations": conversations, }# 將轉換后的推理數據集應用對話模板
reasoning_conversations = tokenizer.apply_chat_template(
    reasoning_dataset.map(generate_conversation, batched = True)["conversations"],
    tokenize = False, # 不進行分詞，僅應用模板
)"""Let's see the first transformed row:"""# 查看轉換后的第一個樣本
reasoning_conversations[0]"""Next we take the non reasoning dataset and convert it to conversational format as well.We have to use Unsloth's `standardize_sharegpt` function to fix up the format of the dataset first.
"""# 處理非推理數據集，轉換為標準對話格式
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)# 將標準化后的非推理數據集應用對話模板
non_reasoning_conversations = tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize = False,
)"""Let's see the first row"""# 查看轉換后的第一個非推理樣本
non_reasoning_conversations[0]"""Now let's see how long both datasets are:"""# 查看兩個數據集的大小
print(len(reasoning_conversations))
print(len(non_reasoning_conversations))"""The non reasoning dataset is much longer. Let's assume we want the model to retain some reasoning capabilities, but we specifically want a chat model.Let's define a ratio of chat only data. The goal is to define some mixture of both sets of data.Let's select 25% reasoning and 75% chat based:
"""# 設置聊天數據比例
# 讓模型保持25%的推理能力和75%的聊天能力
chat_percentage = 0.75"""Let's sample the reasoning dataset by 25% (or whatever is 100% - chat_percentage)"""# 從非推理數據集中抽樣，抽取數量為推理數據集的25%
import pandas as pd
non_reasoning_subset = pd.Series(non_reasoning_conversations)
non_reasoning_subset = non_reasoning_subset.sample(
    int(len(reasoning_conversations) * (1.0 - chat_percentage)),# 采樣大?。和评頂祿笮〉?5%
    random_state = 2407,
)"""Finally combine both datasets:"""# 合并兩個數據集
data = pd.concat([
    pd.Series(reasoning_conversations),    # 推理對話數據
    pd.Series(non_reasoning_subset)        # 采樣后的非推理對話數據
])
data.name = "text"# 設置數據列名為"text"# 將合并的數據轉換為HuggingFace Dataset格式
from datasets import Dataset
combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
# 隨機打亂數據集
combined_dataset = combined_dataset.shuffle(seed = 3407)# 查看數據集的基本信息
print(combined_dataset)# 使用DataFrame展示前10條記錄
import pandas as pd# 轉換為pandas DataFrame以便更好地顯示
df = pd.DataFrame(combined_dataset[:10])
display(df)"""<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](<https://huggingface.co/docs/trl/sft_trainer>). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.
"""# 使用HuggingFace TRL的SFTTrainer進行訓練
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = combined_dataset,
    eval_dataset = None,  # 可以設置評估數據集
    args = SFTConfig(
        dataset_text_field = "text",  # 指定數據集中的文本字段
        per_device_train_batch_size = 2,  # 每個設備的訓練批次大小
        gradient_accumulation_steps = 4,  # 使用梯度累積模擬更大批次大小
        warmup_steps = 5,  # 預熱步數
        # num_train_epochs = 1,  # 設置為1以進行完整訓練
        max_steps = 30,
        learning_rate = 2e-4,   # 學習率（長期訓練可降至2e-5）
        logging_steps = 1,  # 日志記錄間隔
        optim = "adamw_8bit",  # 優化器
        weight_decay = 0.01,  # 權重衰減
        lr_scheduler_type = "linear",  # 學習率調度類型
        seed = 3407,  # 隨機種子
        report_to = "none",   # 可設置為"wandb"等進行實驗追蹤
    ),
)# 顯示當前內存統計
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")"""Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`"""# 開始訓練模型
# 要恢復訓練，可設置 resume_from_checkpoint = True
trainer_stats = trainer.train()# 顯示最終內存和時間統計
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")"""<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Qwen-3` team, the recommended settings for reasoning inference are `temperature = 0.6, top_p = 0.95, top_k = 20`For normal chat based inference, `temperature = 0.7, top_p = 0.8, top_k = 20`
"""# 模型推理
# 使用Unsloth原生推理功能測試模型
# 根據Qwen-3團隊建議：
# - 推理模式：temperature=0.6, top_p=0.95, top_k=20
# - 普通聊天模式：temperature=0.7, top_p=0.8, top_k=20# 測試沒有啟用thinking模式的普通對話
messages = [
    {"role" : "user", "content" : "Solve (x + 2)^2 = 0."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # 必須添加生成提示
    enable_thinking = False,  # 禁用thinking模式
)# 使用普通對話參數進行文本生成
from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 256, # 增加以獲得更長輸出
    temperature = 0.7, top_p = 0.8, top_k = 20, # 普通對話模式參數
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)# 測試啟用thinking模式的推理對話
messages = [
    {"role" : "user", "content" : "Solve (x + 2)^2 = 0."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,  # 必須添加生成提示
    enable_thinking = True, # 啟用thinking模式
)# 使用推理模式參數進行文本生成
from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1024,  # 增加以獲得更長輸出
    temperature = 0.6, top_p = 0.95, top_k = 20, # 推理模式參數
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)"""<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!
"""# 模型保存
# 以下是多種保存模型的方式# 保存LoRA適配器（不包含完整模型，體積?。?model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("leo009/Qwen3-lora_model", token = "") # 上傳到HuggingFace Hub
# tokenizer.push_to_hub("leo009/Qwen3-lora_model", token = "") # 上傳到HuggingFace Hub"""Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:"""# 加載剛剛保存的LoRA適配器（用于推理）
ifTrue:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model",  # 訓練時使用的模型
        max_seq_length = 2048,
        load_in_4bit = True,
    )"""### Saving to float16 for VLLMWe also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to <https://huggingface.co/settings/tokens> for your personal tokens.
"""# 保存為float16格式（用于VLLM）
# 支持多種保存方式：merged_16bit（float16）、merged_4bit（int4）或lora（適配器）# Merge to 16bit
ifFalse:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
ifFalse: # 上傳到HuggingFace Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")# 保存為4位精度
ifTrue:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit_forced",) # 改為_forced版本
ifTrue: # 上傳到HuggingFace Hub
    model.push_to_hub_merged("leo009/Qwen3-vLLM", tokenizer, save_method = "merged_4bit_forced", token = "") # 同樣改為_forced版本# 僅保存LoRA適配器
ifFalse:
    model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
ifFalse: # 上傳到HuggingFace Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")"""### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.Some supported quant methods (full list on our [Wiki page](<https://github.com/unslothai/unsloth/wiki#gguf-quantization-options>)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](<https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb>)
"""# GGUF / llama.cpp 格式轉換
# 支持多種量化方法，如q8_0、q4_k_m、q5_k_m等# F16（Float16）格式# 精度類型：半精度浮點數（16位浮點數）
# 內存占用：比原始FP32（32位浮點數）減少約50%的存儲空間
# 精度保留：保留了相對較高的數值精度，損失較小
# 推理性能：比FP32快，但比更低位量化格式慢
# 適用場景：當需要在內存使用和模型精度之間取得平衡時使用# Q4_K_M格式# 精度類型：混合4位量化格式（是GGUF量化方案的一種）
# 內存占用：比F16減少約75%的存儲空間，比原始FP32減少約87.5%
# 量化策略：針對不同權重采用不同的量化策略# 對注意力機制中的WV矩陣和前饋網絡中的W2矩陣的一半使用Q6_K量化
# 對其余權重使用Q4_K量化# 精度與速度：犧牲一定精度以獲得更小的文件大小和更快的推理速度
# 適用場景：適合在資源受限設備上運行模型，如個人電腦或移動設備# # Save to 8bit Q8_0
# if False:
#     model.save_pretrained_gguf("model", tokenizer,)
# # Remember to go to <https://huggingface.co/settings/tokens> for a token!
# # And change hf to your username!
# if False:
#     model.push_to_hub_gguf("hf/model", tokenizer, token = "")# # 保存為16位GGUF
# if False:
#     model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
# if False: # 上傳到HuggingFace Hub
#     model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")# # 保存為q4_k_m格式GGUF
ifTrue:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
ifTrue:# 上傳到HuggingFace Hub
    model.push_to_hub_gguf("leo009/Qwen3-GGUF", tokenizer, quantization_method = "q4_k_m", token = "")# # 保存多種GGUF格式（批量導出更高效）
# if False:
#     model.push_to_hub_gguf(
#         "hf/model", # Change hf to your username!
#         tokenizer,
#         quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
#         token = "", # Get a token at <https://huggingface.co/settings/tokens>
#     )from google.colab import drive
drive.mount('/content/gdrive')# Save to Google Drive with q4_k_m quantization
ifTrue:
    model.save_pretrained_gguf("/content/gdrive/MyDrive/MyModel/model",
                              tokenizer,
                              quantization_method = "q4_k_m")"""Now, use the `model.gguf` file or `model-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](<https://github.com/janhq/jan>) and Open WebUI [here](<https://github.com/open-webui/open-webui>)And we're done! If you have any questions on Unsloth, we have a [Discord](<https://discord.gg/unsloth>) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](<https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb>)
2. Saving finetunes to Ollama. [Free notebook](<https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb>)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](<https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb>)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](<https://docs.unsloth.ai/get-started/unsloth-notebooks>)!<div class="align-center">
  <a href="<https://unsloth.ai>"><img src="<https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png>" width="115"></a>
  <a href="<https://discord.gg/unsloth>"><img src="<https://github.com/unslothai/unsloth/raw/main/images/Discord.png>" width="145"></a>
  <a href="<https://docs.unsloth.ai/>"><img src="<https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true>" width="125"></a>  Join Discord if you need help + ?? <i>Star us on <a href="<https://github.com/unslothai/unsloth>">Github</a> </i> ??
</div>"""

責任編輯：武曉燕來源：數據STUDIO

unsloth Qwen3 VRAM

51CTO技術棧公眾號

業務
速覽

媒體

51CTO CIOAge HC3i

社區

51CTO博客鴻蒙開發者社區 AI.x社區

教育

51CTO學堂精培企業培訓 CTO訓練營

主站蜘蛛池模板：精品一区在线 | 国产精品一区二区三 | 欧美一区二区大片 | 久久久精彩视频 | 久久成人高清视频 | 黄色毛片在线播放 | 国产精品美女www | 久久久久网站 | 日韩精品av一区二区三区 | 午夜精品一区二区三区三上悠亚 | 午夜一区 | www.99re5.com| 国产成人免费视频 | 国产精品久久久久久久久久 | 中文精品视频 | 欧美日韩视频在线 | 精品欧美色视频网站在线观看 | 亚洲精品一 | www97影院| 成人免费视频 | 欧美视频xxx| 亚洲视频观看 | 性色av一区二区三区 | 亚洲精品99 | 亚洲精品国产电影 | 天天看天天干 | 中文字幕精品一区久久久久 | 国产目拍亚洲精品99久久精品 | 一区视频| 日韩久久久一区二区 | 精品久久久一区二区 | 亚洲一区二区在线电影 | 日本久久久影视 | 亚洲一区视频在线播放 | 亚洲一区二区三区欧美 | 国产综合久久久久久鬼色 | 欧美日韩在线一区二区 | 成年人在线播放 | 色av一区 | 欧美日一区二区 | 中文字幕二区 |