大模型的三大架構及T5體驗原創

一起AI技術

發布于 2024-12-5 10:09

瀏覽

0收藏

前言

本篇我們將對大模型的訓練階段進行初步了解，同時部署一個T5模型進行試用體驗。

大模型的三大架構

大模型（如大型語言模型）的架構通常有多種類型，以下是三種主要的架構：

Encoder-Decoder 架構

架構：由兩個主要部分組成：編碼器??（Encoder）???和解碼器??（Decoder）???，即??Transformer?? 架構。它先理解輸入的信息（Encoder部分），然后基于這個理解生成新的、相關的內容（Decoder部分）。

特點：

這種架構就像是翻譯家。他先聽你說一段話（比如英文），理解它，然后把它翻譯成另一種語言（比如中文）。
擅長處理需要理解輸入然后生成相關輸出的任務，比如翻譯或問答系統。

代表公司及產品：

Google：Transformer、T5（Text-to-Text Transfer Transformer）
Facebook：BART（Bidirectional and Auto-Regressive Transformers）

Encoder-Only 架構

架構：僅包含編碼器部分，即只是使用 ??Transformer??? 的 ??Encoder?? ，它專注于理解和分析輸入的信息，而不是創造新的內容。

特點：

這種架構就像是一個專業的書評家。他閱讀和理解一本書（輸入的信息），然后告訴你這本書是關于什么的，比如它的主題是愛情、冒險還是懸疑。
擅長理解和分類信息，比如判斷一段文本的情感傾向（積極還是消極）或者主題分類。

代表公司及產品：

Google：BERT（Bidirectional Encoder Representations from Transformers）
Facebook：RoBERTa、DistilBERT

Decoder-Only 架構

架構：僅包含解碼器部分，即只是使用 ??Transformer??? 的 ??Decoder?? ，它接收一些信息（開頭），然后生成接下來的內容（故事）。

特點：

這種架構就像一個講故事的人。你給他一個開頭，比如“有一次，一只小貓走失了”，然后他會繼續這個故事，講述下去，一直到故事結束。
擅長創造性的寫作，比如寫小說或自動生成文章。它更多關注于從已有的信息（開頭）擴展出新的內容。

代表公司及產品：

OpenAI：GPT-3、GPT-4

三大架構演進圖

大模型的三大架構及T5體驗-AI.x社區

大模型T5的體驗

為了對大模型有個初步感受，本次我們拉取代碼在本地部署一個T5模型并體驗它。

環境搭建

體驗大模型的方法有兩種方案：??本地環境??? 和 ??遠程環境??。本章我們簡述遠程遠程環境的搭建方法。

遠程環境

第一步：訪問Modelscope平臺，注冊賬號。

大模型的三大架構及T5體驗-AI.x社區

第二步：啟動魔搭平臺的PAI-DSW實例

大模型的三大架構及T5體驗-AI.x社區

第三步：在新開的頁面中登錄阿里云賬號第四步：在PAI-DSW實例中啟動終端命令行

大模型的三大架構及T5體驗-AI.x社區

選擇模型

在魔搭平臺中搜索??ChatLM???模型，查看中文對話0.2B小模型，選擇 ??模型文件???，點擊 ??下載模型??。

代碼拉取

在終端中輸入以下命令，拉取模型代碼

git clone https://www.modelscope.cn/charent/ChatLM-mini-Chinese.git

安裝依賴

pip install transformers

模型使用

from transformers importAutoTokenizer,AutoModelForSeq2SeqLM
import torch

# 因為已經下載了模型，所以model_id改為本地路徑
model_id ='ChatLM-mini-Chinese'

# 判斷GPU是否可用
device = torch.device('cuda'if torch.cuda.is_available()else'cpu')

# 加載分詞器
tokenizer =AutoTokenizer.from_pretrained(model_id)

# 加載模型
model =AutoModelForSeq2SeqLM.from_pretrained(model_id, trust_remote_code=True).to(device)

txt ='如何評價Apple這家公司？'

# 對輸入內容編碼
encode_ids = tokenizer([txt])
input_ids, attention_mask = torch.LongTensor(encode_ids['input_ids']), torch.LongTensor(encode_ids['attention_mask'])

# 調用模型預測結果
outs = model.my_generate(
    input_ids=input_ids.to(device),
    attention_mask=attention_mask.to(device),
    max_seq_len=256,
    search_type='beam',
)

# 對輸出內容解碼
outs_txt = tokenizer.batch_decode(outs.cpu().numpy(), skip_special_tokens=True, clean_up_tokenization_spaces=True)

# 打印輸出
print(outs_txt[0])

運行結果：

大模型的三大架構及T5體驗-AI.x社區

補充知識

tokenizer 分詞器

在Jupyter Notebook中查看??tokenizer??，可以看到分詞器中包含常見的Token。

PreTrainedTokenizerFast(name_or_path='ChatLM-mini-Chinese', vocab_size=29298, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token':'[EOS]','unk_token':'[UNK]','pad_token':'[PAD]'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
0:AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1:AddedToken("[EOS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2:AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
3:AddedToken("[BOS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
4:AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
5:AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
6:AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

model 模型

在Jupyter Notebook中查看??model??，可以看到T5模型的結構。

TextToTextModel(
(shared):Embedding(29298,768)
(encoder): T5Stack(
(embed_tokens):Embedding(29298,768)
(block):ModuleList(
(0): T5Block(
(layer):ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias):Embedding(32,12)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseActDense(
(wi):Linear(in_features=768, out_features=3072, bias=False)
(wo):Linear(in_features=3072, out_features=768, bias=False)
(dropout):Dropout(p=0.1, inplace=False)
(act):ReLU()
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
)
)
(1-9):9 x T5Block(
(layer):ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseActDense(
(wi):Linear(in_features=768, out_features=3072, bias=False)
(wo):Linear(in_features=3072, out_features=768, bias=False)
(dropout):Dropout(p=0.1, inplace=False)
(act):ReLU()
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(decoder): T5Stack(
(embed_tokens):Embedding(29298,768)
(block):ModuleList(
(0): T5Block(
(layer):ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias):Embedding(32,12)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseActDense(
(wi):Linear(in_features=768, out_features=3072, bias=False)
(wo):Linear(in_features=3072, out_features=768, bias=False)
(dropout):Dropout(p=0.1, inplace=False)
(act):ReLU()
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
)
)
(1-9):9 x T5Block(
(layer):ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q):Linear(in_features=768, out_features=768, bias=False)
(k):Linear(in_features=768, out_features=768, bias=False)
(v):Linear(in_features=768, out_features=768, bias=False)
(o):Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseActDense(
(wi):Linear(in_features=768, out_features=3072, bias=False)
(wo):Linear(in_features=3072, out_features=768, bias=False)
(dropout):Dropout(p=0.1, inplace=False)
(act):ReLU()
)
(layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm):FusedRMSNorm(torch.Size([768]), eps=1e-06, elementwise_affine=True)
(dropout):Dropout(p=0.1, inplace=False)
)
(lm_head):Linear(in_features=768, out_features=29298, bias=False)
)

查看該模型的結構，其結構是一個典型的??Transformer??模型結構。
??(encoder): T5Stack?? 是編碼器，其內部是由10個??T5Block??組成，??(decoder): T5Stack?? 是解碼器，其內部也是由10個??T5Block??組成。
?? T5LayerSelfAttention?? 是自注意力處理模塊，??T5LayerCrossAttention?? 是融合注意力處理模塊，??T5LayerFF?? 是前饋模塊。
??(lm_head): Linear?? 是對應Transformer的輸出層。

內容小結

大模型有三大架構：Encoderdecoder、Encoder-Only、Decoder-Only。
Encoderdecoder架構就像是翻譯家，代表模型是T5模型。
Encoder-Only架構就像是書評家，代表模型是BERT模型。
Decoder-Only架構就像是數學家，代表模型是GPT-4模型。
大模型訓練階段由三個階段組成：??預訓練(PT)?? 、??監督微調(SFT)?? 和??基于人類反饋的強化學習(RLHF)?? 。

本文轉載自公眾號一起AI技術作者：Dongming

原文鏈接：??https://mp.weixin.qq.com/s/-cBc9QpDsjn5b4kcD6wn8Q??

?著作權歸作者所有，如需轉載，請注明出處，否則將追究法律責任

標簽

大模型

架構

贊

回復

舉報

回復

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

大模型的三大架構及T5體驗原創

前言

大模型的三大架構

Encoder-Decoder 架構

Encoder-Only 架構

Decoder-Only 架構

三大架構演進圖

大模型T5的體驗

環境搭建

遠程環境

選擇模型

代碼拉取

安裝依賴

模型使用

補充知識

tokenizer 分詞器

model 模型

內容小結

目錄

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

51CTO

51CTO博客

51CTO學堂

大模型的三大架構及T5體驗 原創

前言

大模型的三大架構

Encoder-Decoder 架構

Encoder-Only 架構

Decoder-Only 架構

三大架構演進圖

大模型T5的體驗

環境搭建

遠程環境

選擇模型

代碼拉取

安裝依賴

模型使用

補充知識

tokenizer 分詞器

model 模型

內容小結

目錄

大模型的三大架構及T5體驗原創