如何使用Protobuf做數(shù)據(jù)交換

作者：Marty Kalin 2019-11-22 08:40:19

在以不同語言編寫并在不同平臺上運行的應(yīng)用程序之間交換數(shù)據(jù)時，Protobuf 編碼可提高效率。

[[283226]]

在以不同語言編寫并在不同平臺上運行的應(yīng)用程序之間交換數(shù)據(jù)時，Protobuf 編碼可提高效率。

協(xié)議緩沖區(qū)Protocol Buffers（Protobufs）像 XML 和 JSON 一樣，可以讓用不同語言編寫并在不同平臺上運行的應(yīng)用程序交換數(shù)據(jù)。例如，用 Go 編寫的發(fā)送程序可以在 Protobuf 中對以 Go 表示的銷售訂單數(shù)據(jù)進(jìn)行編碼，然后用 Java 編寫的接收方可以對它進(jìn)行解碼，以獲取所接收訂單數(shù)據(jù)的 Java 表示方式。這是在網(wǎng)絡(luò)連接上的結(jié)構(gòu)示意圖：

Go 銷售訂單 —> Pbuf 編碼 —> 網(wǎng)絡(luò) —> Pbuf 界面 —> Java 銷售訂單

與 XML 和 JSON 相比，Protobuf 編碼是二進(jìn)制而不是文本，這會使調(diào)試復(fù)雜化。但是，正如本文中的代碼示例所確認(rèn)的那樣，Protobuf 編碼在大小上比 XML 或 JSON 編碼要有效得多。

Protobuf 以另一種方式提供了這種有效性。在實現(xiàn)級別，Protobuf 和其他編碼系統(tǒng)對結(jié)構(gòu)化數(shù)據(jù)進(jìn)行序列化serialize和反序列化deserialize。序列化將特定語言的數(shù)據(jù)結(jié)構(gòu)轉(zhuǎn)換為字節(jié)流，反序列化是將字節(jié)流轉(zhuǎn)換回特定語言的數(shù)據(jù)結(jié)構(gòu)的逆運算。序列化和反序列化可能成為數(shù)據(jù)交換的瓶頸，因為這些操作會占用大量 CPU。高效的序列化和反序列化是 Protobuf 的另一個設(shè)計目標(biāo)。

最近的編碼技術(shù)，例如 Protobuf 和 FlatBuffers，源自 1990 年代初期的 DCE/RPC（分布式計算環(huán)境/遠(yuǎn)程過程調(diào)用Distributed Computing Environment/Remote Procedure Call）計劃。與 DCE/RPC 一樣，Protobuf 在數(shù)據(jù)交換中為 IDL（接口定義語言）和編碼層做出了貢獻(xiàn)。

本文將著眼于這兩層，然后提供 Go 和 Java 中的代碼示例以充實 Protobuf 的細(xì)節(jié)，并表明 Protobuf 是易于使用的。

Protobuf 作為一個 IDL 和編碼層

像 Protobuf 一樣，DCE/RPC 被設(shè)計為與語言和平臺無關(guān)。適當(dāng)?shù)膸旌蛯嵱贸绦蛟试S任何語言和平臺用于 DCE/RPC 領(lǐng)域。此外，DCE/RPC 體系結(jié)構(gòu)非常優(yōu)雅。IDL 文檔是一側(cè)的遠(yuǎn)程過程與另一側(cè)的調(diào)用者之間的協(xié)定。Protobuf 也是以 IDL 文檔為中心的。

IDL 文檔是文本，在 DCE/RPC 中，使用基本 C 語法以及元數(shù)據(jù)的語法擴(kuò)展（方括號）和一些新的關(guān)鍵字，例如 interface。這是一個例子：

[uuid (2d6ead46-05e3-11ca-7dd1-426909beabcd), version(1.0)]
interface echo {
   const long int ECHO_SIZE = 512;
   void echo(
      [in]          handle_t h,
      [in, string]  idl_char from_client[ ],
      [out, string] idl_char from_service[ECHO_SIZE]
   );
}

該 IDL 文檔聲明了一個名為 echo 的過程，該過程帶有三個參數(shù)：類型為 handle_t（實現(xiàn)指針）和 idl_char（ASCII 字符數(shù)組）的 [in] 參數(shù)被傳遞給遠(yuǎn)程過程，而 [out] 參數(shù)（也是一個字符串）從該過程中傳回。在此示例中，echo 過程不會顯式返回值（echo 左側(cè)的 void），但也可以返回值。返回值，以及一個或多個 [out] 參數(shù)，允許遠(yuǎn)程過程任意返回許多值。下一節(jié)將介紹 Protobuf IDL，它的語法不同，但同樣用作數(shù)據(jù)交換中的協(xié)定。

DCE/RPC 和 Protobuf 中的 IDL 文檔是創(chuàng)建用于交換數(shù)據(jù)的基礎(chǔ)結(jié)構(gòu)代碼的實用程序的輸入：

IDL 文檔 —> DCE/PRC 或 Protobuf 實用程序 —> 數(shù)據(jù)交換的支持代碼

作為相對簡單的文本，IDL 是同樣便于人類閱讀的關(guān)于數(shù)據(jù)交換細(xì)節(jié)的文檔（特別是交換的數(shù)據(jù)項的數(shù)量和每個項的數(shù)據(jù)類型）。

Protobuf 可用于現(xiàn)代 RPC 系統(tǒng)，例如 gRPC；但是 Protobuf 本身僅提供 IDL 層和編碼層，用于從發(fā)送者傳遞到接收者的消息。與原本的 DCE/RPC 一樣，Protobuf 編碼是二進(jìn)制的，但效率更高。

目前，XML 和 JSON 編碼仍在通過 Web 服務(wù)等技術(shù)進(jìn)行的數(shù)據(jù)交換中占主導(dǎo)地位，這些技術(shù)利用 Web 服務(wù)器、傳輸協(xié)議（例如 TCP、HTTP）以及標(biāo)準(zhǔn)庫和實用程序等原有的基礎(chǔ)設(shè)施來處理 XML 和 JSON 文檔。此外，各種類型的數(shù)據(jù)庫系統(tǒng)可以存儲 XML 和 JSON 文檔，甚至舊式關(guān)系型系統(tǒng)也可以輕松生成查詢結(jié)果的 XML 編碼。現(xiàn)在，每種通用編程語言都具有支持 XML 和 JSON 的庫。那么，是什么讓我們回到 Protobuf 之類的二進(jìn)制編碼系統(tǒng)呢？

讓我們看一下負(fù)十進(jìn)制值 -128。以 2 的補碼二進(jìn)制表示形式（在系統(tǒng)和語言中占主導(dǎo)地位）中，此值可以存儲在單個 8 位字節(jié)中：10000000。此整數(shù)值在 XML 或 JSON 中的文本編碼需要多個字節(jié)。例如，UTF-8 編碼需要四個字節(jié)的字符串，即 -128，即每個字符一個字節(jié)（十六進(jìn)制，值為 0x2d、0x31、0x32 和 0x38）。XML 和 JSON 還添加了標(biāo)記字符，例如尖括號和大括號。有關(guān) Protobuf 編碼的詳細(xì)信息下面就會介紹，但現(xiàn)在的關(guān)注點是一個通用點：文本編碼的壓縮性明顯低于二進(jìn)制編碼。

在 Go 中使用 Protobuf 的示例

我的代碼示例著重于 Protobuf 而不是 RPC。以下是第一個示例的概述：

名為 dataitem.proto 的 IDL 文件定義了一個 Protobuf 消息，它具有六個不同類型的字段：具有不同范圍的整數(shù)值、固定大小的浮點值以及兩個不同長度的字符串。
Protobuf 編譯器使用 IDL 文件生成 Go 版本（以及后面的 Java 版本）的 Protobuf 消息及支持函數(shù)。
Go 應(yīng)用程序使用隨機生成的值填充原生的 Go 數(shù)據(jù)結(jié)構(gòu)，然后將結(jié)果序列化為本地文件。為了進(jìn)行比較， XML 和 JSON 編碼也被序列化為本地文件。
作為測試，Go 應(yīng)用程序通過反序列化 Protobuf 文件的內(nèi)容來重建其原生數(shù)據(jù)結(jié)構(gòu)的實例。
作為語言中立性測試，Java 應(yīng)用程序還會對 Protobuf 文件的內(nèi)容進(jìn)行反序列化以獲取原生數(shù)據(jù)結(jié)構(gòu)的實例。

我的網(wǎng)站上提供了該 IDL 文件以及兩個 Go 和一個 Java 源文件，打包為 ZIP 文件。

最重要的 Protobuf IDL 文檔如下所示。該文檔存儲在文件 dataitem.proto 中，并具有常規(guī)的.proto 擴(kuò)展名。

示例 1、Protobuf IDL 文檔

syntax = "proto3";
 
package main;
 
message DataItem {
  int64  oddA  = 1;
  int64  evenA = 2;
  int32  oddB  = 3;
  int32  evenB = 4;
  float  small = 5;
  float  big   = 6;
  string short = 7;
  string long  = 8;
}

該 IDL 使用當(dāng)前的 proto3 而不是較早的 proto2 語法。軟件包名稱（在本例中為 main）是可選的，但是慣例使用它以避免名稱沖突。這個結(jié)構(gòu)化的消息包含八個字段，每個字段都有一個 Protobuf 數(shù)據(jù)類型（例如，int64、string）、名稱（例如，oddA、short）和一個等號 = 之后的數(shù)字標(biāo)簽（即鍵）。標(biāo)簽（在此示例中為 1 到 8）是唯一的整數(shù)標(biāo)識符，用于確定字段序列化的順序。

Protobuf 消息可以嵌套到任意級別，而一個消息可以是另外一個消息的字段類型。這是一個使用 DataItem 消息作為字段類型的示例：

message DataItems {
  repeated DataItem item = 1;
}

單個 DataItems 消息由重復(fù)的（零個或多個）DataItem 消息組成。

為了清晰起見，Protobuf 還支持枚舉類型：

enum PartnershipStatus {
  reserved "FREE", "CONSTRAINED", "OTHER";
}

reserved 限定符確保用于實現(xiàn)這三個符號名的數(shù)值不能重復(fù)使用。

為了生成一個或多個聲明 Protobuf 消息結(jié)構(gòu)的特定于語言的版本，包含這些結(jié)構(gòu)的 IDL 文件被傳遞到protoc 編譯器（可在 Protobuf GitHub 存儲庫中找到）。對于 Go 代碼，可以以通常的方式安裝支持的 Protobuf 庫（這里以 ％ 作為命令行提示符）：

% go get github.com/golang/protobuf/proto

將 Protobuf IDL 文件 dataitem.proto 編譯為 Go 源代碼的命令是：

% protoc --go_out=. dataitem.proto

標(biāo)志 --go_out 指示編譯器生成 Go 源代碼。其他語言也有類似的標(biāo)志。在這種情況下，結(jié)果是一個名為 dataitem.pb.go 的文件，該文件足夠小，可以將其基本內(nèi)容復(fù)制到 Go 應(yīng)用程序中。以下是生成的代碼的主要部分：

var _ = proto.Marshal
 
type DataItem struct {
   OddA  int64   `protobuf:"varint,1,opt,name=oddA" json:"oddA,omitempty"`
   EvenA int64   `protobuf:"varint,2,opt,name=evenA" json:"evenA,omitempty"`
   OddB  int32   `protobuf:"varint,3,opt,name=oddB" json:"oddB,omitempty"`
   EvenB int32   `protobuf:"varint,4,opt,name=evenB" json:"evenB,omitempty"`
   Small float32 `protobuf:"fixed32,5,opt,name=small" json:"small,omitempty"`
   Big   float32 `protobuf:"fixed32,6,opt,name=big" json:"big,omitempty"`
   Short string  `protobuf:"bytes,7,opt,name=short" json:"short,omitempty"`
   Long  string  `protobuf:"bytes,8,opt,name=long" json:"long,omitempty"`
}
 
func (m *DataItem) Reset()         { *m = DataItem{} }
func (m *DataItem) String() string { return proto.CompactTextString(m) }
func (*DataItem) ProtoMessage()    {}
func init() {}

編譯器生成的代碼具有 Go 結(jié)構(gòu) DataItem，該結(jié)構(gòu)導(dǎo)出 Go 字段（名稱現(xiàn)已大寫開頭），該字段與 Protobuf IDL 中聲明的名稱匹配。該結(jié)構(gòu)字段具有標(biāo)準(zhǔn)的 Go 數(shù)據(jù)類型：int32、int64、float32 和 string。在每個字段行的末尾，是描述 Protobuf 類型的字符串，提供 Protobuf IDL 文檔中的數(shù)字標(biāo)簽及有關(guān) JSON 信息的元數(shù)據(jù)，這將在后面討論。

此外也有函數(shù)；最重要的是 Proto.Marshal，用于將 DataItem 結(jié)構(gòu)的實例序列化為 Protobuf 格式。輔助函數(shù)包括：清除 DataItem 結(jié)構(gòu)的 Reset，生成 DataItem 的單行字符串表示的 String。

描述 Protobuf 編碼的元數(shù)據(jù)應(yīng)在更詳細(xì)地分析 Go 程序之前進(jìn)行仔細(xì)研究。

Protobuf 編碼

Protobuf 消息的結(jié)構(gòu)為鍵/值對的集合，其中數(shù)字標(biāo)簽為鍵，相應(yīng)的字段為值。字段名稱（例如，oddA 和 small）是供人類閱讀的，但是 protoc 編譯器的確使用了字段名稱來生成特定于語言的對應(yīng)名稱。例如，Protobuf IDL 中的 oddA 和 small 名稱在 Go 結(jié)構(gòu)中分別成為字段 OddA 和 Small。

鍵和它們的值都被編碼，但是有一個重要的區(qū)別：一些數(shù)字值具有固定大小的 32 或 64 位的編碼，而其他數(shù)字（包括消息標(biāo)簽）則是 varint 編碼的，位數(shù)取決于整數(shù)的絕對值。例如，整數(shù)值 1 到 15 需要 8 位 varint 編碼，而值 16 到 2047 需要 16 位。varint 編碼在本質(zhì)上與 UTF-8 編碼類似（但細(xì)節(jié)不同），它偏愛較小的整數(shù)值而不是較大的整數(shù)值。（有關(guān)詳細(xì)分析，請參見 Protobuf 編碼指南）結(jié)果是，Protobuf 消息應(yīng)該在字段中具有較小的整數(shù)值（如果可能），并且鍵數(shù)應(yīng)盡可能少，但每個字段至少得有一個鍵。

下表 1 列出了 Protobuf 編碼的要點：

編碼	示例類型	長度
`varint`	`int32`、`uint32`、`int64`	可變長度
`fixed`	`fixed32`、`float`、`double`	固定的 32 位或 64 位長度
字節(jié)序列	`string`、`bytes`	序列長度

表 1. Protobuf 數(shù)據(jù)類型

未明確固定長度的整數(shù)類型是 varint 編碼的；因此，在 varint 類型中，例如 uint32（u 代表無符號），數(shù)字 32 描述了整數(shù)的范圍（在這種情況下為 0 到 2³² - 1），而不是其位的大小，該位大小取決于值。相比之下，對于固定長度類型（例如 fixed32 或 double），Protobuf 編碼分別需要 32 位和 64 位。Protobuf 中的字符串是字節(jié)序列；因此，字段編碼的大小就是字節(jié)序列的長度。

另一個高效的方法值得一提。回想一下前面的示例，其中的 DataItems 消息由重復(fù)的 DataItem 實例組成：

message DataItems {
  repeated DataItem item = 1;
}

repeated 表示 DataItem 實例是打包的：集合具有單個標(biāo)簽，在這里是 1。因此，具有重復(fù)的 DataItem 實例的 DataItems 消息比具有多個但單獨的 DataItem 字段、每個字段都需要自己的標(biāo)簽的消息的效率更高。

了解了這一背景，讓我們回到 Go 程序。

dataItem 程序的細(xì)節(jié)

dataItem 程序創(chuàng)建一個 DataItem 實例，并使用適當(dāng)類型的隨機生成的值填充字段。Go 有一個 rand 包，帶有用于生成偽隨機整數(shù)和浮點值的函數(shù)，而我的 randString 函數(shù)可以從字符集中生成指定長度的偽隨機字符串。設(shè)計目標(biāo)是要有一個具有不同類型和位大小的字段值的 DataItem 實例。例如，OddA 和 EvenA 值分別是 64 位非負(fù)整數(shù)值的奇數(shù)和偶數(shù)；但是 OddB 和 EvenB 變體的大小為 32 位，并存放 0 到 2047 之間的小整數(shù)值。隨機浮點值的大小為 32 位，字符串為 16（Short）和 32（Long）字符的長度。這是用隨機值填充 DataItem 結(jié)構(gòu)的代碼段：

// 可變長度整數(shù)
n1 := rand.Int63()        // 大整數(shù)
if (n1 & 1) == 0 { n1++ } // 確保其是奇數(shù)
...
n3 := rand.Int31() % UpperBound // 小整數(shù)
if (n3 & 1) == 0 { n3++ }       // 確保其是奇數(shù)
 
// 固定長度浮點數(shù)
...
t1 := rand.Float32()
t2 := rand.Float32()
...
// 字符串
str1 := randString(StrShort)
str2 := randString(StrLong)
 
// 消息
dataItem := &DataItem {
   OddA:  n1,
   EvenA: n2,
   OddB:  n3,
   EvenB: n4,
   Big:   f1,
   Small: f2,
   Short: str1,
   Long:  str2,
}

創(chuàng)建并填充值后，DataItem 實例將以 XML、JSON 和 Protobuf 進(jìn)行編碼，每種編碼均寫入本地文件：

func encodeAndserialize(dataItem *DataItem) {
   bytes, _ := xml.MarshalIndent(dataItem, "", " ")  // Xml to dataitem.xml
   ioutil.WriteFile(XmlFile, bytes, 0644)            // 0644 is file access permissions
 
   bytes, _ = json.MarshalIndent(dataItem, "", " ")  // Json to dataitem.json
   ioutil.WriteFile(JsonFile, bytes, 0644)
 
   bytes, _ = proto.Marshal(dataItem)                // Protobuf to dataitem.pbuf
   ioutil.WriteFile(PbufFile, bytes, 0644)
}

這三個序列化函數(shù)使用術(shù)語 marshal，它與 serialize 意思大致相同。如代碼所示，三個 Marshal 函數(shù)均返回一個字節(jié)數(shù)組，然后將其寫入文件。（為簡單起見，忽略可能的錯誤處理。）在示例運行中，文件大小為：

dataitem.xml:  262 bytes
dataitem.json: 212 bytes
dataitem.pbuf:  88 bytes

Protobuf 編碼明顯小于其他兩個編碼方案。通過消除縮進(jìn)字符（在這種情況下為空白和換行符），可以稍微減小 XML 和 JSON 序列化的大小。

以下是 dataitem.json 文件，該文件最終是由 json.MarshalIndent 調(diào)用產(chǎn)生的，并添加了以 ## 開頭的注釋：

{
 "oddA":  4744002665212642479,                ## 64-bit >= 0
 "evenA": 2395006495604861128,                ## ditto
 "oddB":  57,                                 ## 32-bit >= 0 but < 2048
 "evenB": 468,                                ## ditto
 "small": 0.7562016,                          ## 32-bit floating-point
 "big":   0.85202795,                         ## ditto
 "short": "ClH1oDaTtoX$HBN5",                 ## 16 random chars
 "long":  "xId0rD3Cri%3Wt%^QjcFLJgyXBu9^DZI"  ## 32 random chars
}

盡管這些序列化的數(shù)據(jù)寫入到本地文件中，但是也可以使用相同的方法將數(shù)據(jù)寫入網(wǎng)絡(luò)連接的輸出流。

測試序列化和反序列化

Go 程序接下來通過將先前寫入 dataitem.pbuf 文件的字節(jié)反序列化為 DataItem 實例來運行基本測試。這是代碼段，其中去除了錯誤檢查部分：

filebytes, err := ioutil.ReadFile(PbufFile) // get the bytes from the file
...
testItem.Reset()                            // clear the DataItem structure
err = proto.Unmarshal(filebytes, testItem)  // deserialize into a DataItem instance

用于 Protbuf 反序列化的 proto.Unmarshal 函數(shù)與 proto.Marshal 函數(shù)相反。原始的 DataItem 和反序列化的副本將被打印出來以確認(rèn)完全匹配：

Original:
2041519981506242154 3041486079683013705 1192 1879
0.572123 0.326855
boPb#T0O8Xd&Ps5EnSZqDg4Qztvo7IIs 9vH66AiGSQgCDxk&
 
Deserialized:
2041519981506242154 3041486079683013705 1192 1879
0.572123 0.326855
boPb#T0O8Xd&Ps5EnSZqDg4Qztvo7IIs 9vH66AiGSQgCDxk&

一個 Java Protobuf 客戶端

用 Java 寫的示例是為了確認(rèn) Protobuf 的語言中立性。原始 IDL 文件可用于生成 Java 支持代碼，其中涉及嵌套類。但是，為了抑制警告信息，可以進(jìn)行一些補充。這是修訂版，它指定了一個 DataMsg 作為外部類的名稱，內(nèi)部類在該 Protobuf 消息后面自動命名為 DataItem：

syntax = "proto3";
 
package main;
 
option java_outer_classname = "DataMsg";
 
message DataItem {
...

進(jìn)行此更改后，protoc 編譯與以前相同，只是所期望的輸出現(xiàn)在是 Java 而不是 Go：

% protoc --java_out=. dataitem.proto

生成的源文件（在名為 main 的子目錄中）為 DataMsg.java，長度約為 1,120 行：Java 并不簡潔。編譯然后運行 Java 代碼需要具有 Protobuf 庫支持的 JAR 文件。該文件位于 Maven 存儲庫中。

放置好這些片段后，我的測試代碼相對較短（并且在 ZIP 文件中以 Main.java 形式提供）：

package main;
import java.io.FileInputStream;
 
public class Main {
   public static void main(String[] args) {
      String path = "dataitem.pbuf";  // from the Go program's serialization
      try {
         DataMsg.DataItem deserial =
           DataMsg.DataItem.newBuilder().mergeFrom(new FileInputStream(path)).build();
 
         System.out.println(deserial.getOddA()); // 64-bit odd
         System.out.println(deserial.getLong()); // 32-character string
      }
      catch(Exception e) { System.err.println(e); }
    }
}

當(dāng)然，生產(chǎn)級的測試將更加徹底，但是即使是該初步測試也可以證明 Protobuf 的語言中立性：dataitem.pbuf 文件是 Go 程序?qū)?Go 語言版的 DataItem 進(jìn)行序列化的結(jié)果，并且該文件中的字節(jié)被反序列化以產(chǎn)生一個 Java 語言的 DataItem 實例。Java 測試的輸出與 Go 測試的輸出相同。

用 numPairs 程序來結(jié)束

讓我們以一個示例作為結(jié)尾，來突出 Protobuf 效率，但又強調(diào)在任何編碼技術(shù)中都會涉及到的成本。考慮以下 Protobuf IDL 文件：

syntax = "proto3";
package main;
 
message NumPairs {
  repeated NumPair pair = 1;
}
 
message NumPair {
  int32 odd = 1;
  int32 even = 2;
}

NumPair 消息由兩個 int32 值以及每個字段的整數(shù)標(biāo)簽組成。NumPairs 消息是嵌入的 NumPair 消息的序列。

Go 語言的 numPairs 程序（如下）創(chuàng)建了 200 萬個 NumPair 實例，每個實例都附加到 NumPairs 消息中。該消息可以按常規(guī)方式進(jìn)行序列化和反序列化。

示例 2、numPairs 程序

package main
 
import (
   "math/rand"
   "time"
   "encoding/xml"
   "encoding/json"
   "io/ioutil"
   "github.com/golang/protobuf/proto"
)
 
// protoc-generated code: start
var _ = proto.Marshal
type NumPairs struct {
   Pair []*NumPair `protobuf:"bytes,1,rep,name=pair" json:"pair,omitempty"`
}
 
func (m *NumPairs) Reset()         { *m = NumPairs{} }
func (m *NumPairs) String() string { return proto.CompactTextString(m) }
func (*NumPairs) ProtoMessage()    {}
func (m *NumPairs) GetPair() []*NumPair {
   if m != nil { return m.Pair }
   return nil
}
 
type NumPair struct {
   Odd  int32 `protobuf:"varint,1,opt,name=odd" json:"odd,omitempty"`
   Even int32 `protobuf:"varint,2,opt,name=even" json:"even,omitempty"`
}
 
func (m *NumPair) Reset()         { *m = NumPair{} }
func (m *NumPair) String() string { return proto.CompactTextString(m) }
func (*NumPair) ProtoMessage()    {}
func init() {}
// protoc-generated code: finish
 
var numPairsStruct NumPairs
var numPairs = &numPairsStruct
 
func encodeAndserialize() {
   // XML encoding
   filename := "./pairs.xml"
   bytes, _ := xml.MarshalIndent(numPairs, "", " ")
   ioutil.WriteFile(filename, bytes, 0644)
 
   // JSON encoding
   filename = "./pairs.json"
   bytes, _ = json.MarshalIndent(numPairs, "", " ")
   ioutil.WriteFile(filename, bytes, 0644)
 
   // ProtoBuf encoding
   filename = "./pairs.pbuf"
   bytes, _ = proto.Marshal(numPairs)
   ioutil.WriteFile(filename, bytes, 0644)
}
 
const HowMany = 200 * 100  * 100 // two million
 
func main() {
   rand.Seed(time.Now().UnixNano())
 
   // uncomment the modulus operations to get the more efficient version
   for i := 0; i < HowMany; i++ {
      n1 := rand.Int31() // % 2047
      if (n1 & 1) == 0 { n1++ } // ensure it's odd
      n2 := rand.Int31() // % 2047
      if (n2 & 1) == 1 { n2++ } // ensure it's even
 
      next := &NumPair {
                 Odd:  n1,
                 Even: n2,
              }
      numPairs.Pair = append(numPairs.Pair, next)
   }
   encodeAndserialize()
}

每個 NumPair 中隨機生成的奇數(shù)和偶數(shù)值的范圍在 0 到 20 億之間變化。就原始數(shù)據(jù)（而非編碼數(shù)據(jù)）而言，Go 程序中生成的整數(shù)總共為 16MB：每個 NumPair 為兩個整數(shù)，總計為 400 萬個整數(shù)，每個值的大小為四個字節(jié)。

為了進(jìn)行比較，下表列出了 XML、JSON 和 Protobuf 編碼的示例 NumsPairs 消息的 200 萬個 NumPair 實例。原始數(shù)據(jù)也包括在內(nèi)。由于 numPairs 程序生成隨機值，因此樣本運行的輸出有所不同，但接近表中顯示的大小。

編碼	文件	字節(jié)大小	Pbuf/其它比例
無	pairs.raw	16MB	169%
Protobuf	pairs.pbuf	27MB	—
JSON	pairs.json	100MB	27%
XML	pairs.xml	126MB	21%

表 2. 16MB 整數(shù)的編碼開銷

不出所料，Protobuf 和之后的 XML 和 JSON 差別明顯。Protobuf 編碼大約是 JSON 的四分之一，是 XML 的五分之一。但是原始數(shù)據(jù)清楚地表明 Protobuf 也會產(chǎn)生編碼開銷：序列化的 Protobuf 消息比原始數(shù)據(jù)大 11MB。包括 Protobuf 在內(nèi)的任何編碼都涉及結(jié)構(gòu)化數(shù)據(jù)，這不可避免地會增加字節(jié)。

序列化的 200 萬個 NumPair 實例中的每個實例都包含四個整數(shù)值：Go 結(jié)構(gòu)中的 Even 和 Odd 字段分別一個，而 Protobuf 編碼中的每個字段、每個標(biāo)簽一個。對于原始數(shù)據(jù)（而不是編碼數(shù)據(jù)），每個實例將達(dá)到 16 個字節(jié)，樣本 NumPairs 消息中有 200 萬個實例。但是 Protobuf 標(biāo)記（如 NumPair 字段中的 int32 值）使用 varint 編碼，因此字節(jié)長度有所不同。特別是，小的整數(shù)值（在這種情況下，包括標(biāo)簽在內(nèi)）需要不到四個字節(jié)進(jìn)行編碼。

如果對 numPairs 程序進(jìn)行了修改，以使兩個 NumPair 字段的值小于 2048，且其編碼為一或兩個字節(jié)，則 Protobuf 編碼將從 27MB 下降到 16MB，這正是原始數(shù)據(jù)的大小。下表總結(jié)了樣本運行中的新編碼大小。

編碼	文件	字節(jié)大小	Pbuf/其它比例
None	pairs.raw	16MB	100%
Protobuf	pairs.pbuf	16MB	—
JSON	pairs.json	77MB	21%
XML	pairs.xml	103MB	15%

表 3. 編碼 16MB 的小于 2048 的整數(shù)

總之，修改后的 numPairs 程序的字段值小于 2048，可減少原始數(shù)據(jù)中每個四字節(jié)整數(shù)值的大小。但是 Protobuf 編碼仍然需要標(biāo)簽，這些標(biāo)簽會在 Protobuf 消息中添加字節(jié)。Protobuf 編碼確實會增加消息大小，但是如果要編碼相對較小的整數(shù)值（無論是字段還是鍵），則可以通過 varint 因子來減少此開銷。

對于包含混合類型的結(jié)構(gòu)化數(shù)據(jù)（且整數(shù)值相對較小）的中等大小的消息，Protobuf 明顯優(yōu)于 XML 和 JSON 等選項。在其他情況下，數(shù)據(jù)可能不適合 Protobuf 編碼。例如，如果兩個應(yīng)用程序需要共享大量文本記錄或大整數(shù)值，則可以采用壓縮而不是編碼技術(shù)。

責(zé)任編輯：龐桂玉來源： Linux中國

Protobuf Go 編程語言

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看