解鎖 PaddleOCR 的超能力

作者：小R 2023-11-12 23:01:44

在本文中，我們將探討如何使用 PaddleOCR，一款基于深度學習的先進OCR工具包，進行文本檢測和識別任務。

光學字符識別（OCR）是一項強大的技術，使機器能夠從圖像或掃描文檔中識別和提取文本。OCR 在各個領域都有應用，包括文件數字化、從圖像中提取文本以及基于文本的數據分析。在本文中，我們將探討如何使用 PaddleOCR，一款基于深度學習的先進OCR工具包，進行文本檢測和識別任務。我們將逐步演示一個代碼片段，展示了整個過程。

一、先決條件

在我們深入代碼之前，讓我們確保我們已經準備好運行 PaddleOCR 庫。確保您的計算機上安裝了以下必要先決條件：

Python（3.6 或更高版本）
PaddleOCR 庫
其他必要的依賴項（例如 NumPy、pandas 等）

您可以使用以下 pip 命令安裝 PaddleOCR：

pip install paddleocr

二、設置 PaddleOCR

一旦您安裝了 Python 和所需的庫，我們來設置 PaddleOCR。您可以使用 PaddleOCR 的預訓練模型，這些模型可用于文本檢測和識別。

使用 PaddleOCR 進行文本檢測和識別的代碼片段包括以下主要組件：

圖像預處理：加載輸入圖像并執行必要的預處理步驟，例如調整大小或歸一化。
文本檢測：使用 PaddleOCR 文本檢測模型來定位輸入圖像中文本區域的邊界框。
文本識別：對于每個檢測到的邊界框，使用 PaddleOCR 文本識別模型來提取相應的文本。
后處理：整理檢測到的文本和識別結果以進行進一步分析或顯示。

三、逐步實現

讓我們分解代碼片段，詳細解釋每個步驟：

1.文本檢測

該代碼是一個名為 DecMain 的類的一部分，該類專為使用真實數據進行光學字符識別（OCR）評估而設計。它使用 PaddleOCR 從圖像中提取文本，然后計算指標（如準確率、召回率和字符錯誤率 [CER]）來評估 OCR 系統的性能。

class DecMain:
    def __init__(self, image_folder_path, label_file_path, output_file):
        self.image_folder_path = image_folder_path
        self.label_file_path = label_file_path
        self.output_file = output_file

    def run_dec(self):
        # Check and update the ground truth file
        CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file()

        df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df()

        ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file()

        # Get the extracted text as a list of dictionaries (representing the OCR results)
        ocr_results = df.to_dict(orient="records")

        # Calculate precision, recall, and CER
        precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall()

        CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples,
                    file_name=self.output_file).create_sheet()

讓我們分解代碼并解釋每個部分：

class DecMain:

def __init__(self, image_folder_path, label_file_path, output_file):

self.image_folder_path = image_folder_path

self.label_file_path = label_file_path

self.output_file = output_file

DecMain 類有一個 __init__方法，用以下參數初始化對象：

image_folder_path：用于 OCR 的輸入圖像所在文件夾的路徑。
label_file_path：包含圖像的實際文本內容的真實標簽文件的路徑。
output_file：評估結果將保存在的輸出文件的文件名。

def run_dec(self):
       # Check and update the ground truth file
       CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file()

run_dec方法負責運行 OCR 評估過程。首先，它使用 CheckAndUpdateGroundTruth 類來檢查并更新真實標簽文件。

df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df()

OcrToDf 類用于將 OCR 結果轉換為 pandas DataFrame（df）。它接受以下參數：

image_folder：包含 OCR 輸入圖像的文件夾的路徑。
label_file：真實標簽文件的路徑。
det=True和 rec=True參數表示 DataFrame 將包含文本檢測和識別結果。


ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file()

ReadGroundTruthFile 類用于讀取真實標簽文件并將其內容加載到 ground_truth_data變量中。

# Get the extracted text as a list of dictionaries (representing the OCR results)
        ocr_results = df.to_dict(orient="records")

從 DataFrame df 中獲取的 OCR 結果轉換為字典列表（ocr_results），每個字典代表單個圖像的 OCR 結果。

# Calculate precision, recall, and CER
        precision, recall, total_samples = CalculateMetrics(groun
        d_truth_data, ocr_results).calculate_precision_recall()

CalculateMetrics 類用于計算 OCR 評估指標：準確率、召回率和評估的總樣本數。該類將真實數據和 OCR 結果作為輸入。

CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples,

                   file_name=self.output_file).create_sheet()

CreateSheet 類負責創建輸出表格（例如 Excel 或 CSV），其中包含評估指標和 OCR 結果。它接受 DataFrame df、準確率、召回率、總樣本數和輸出文件名作為輸入。

總的來說，DecMain 類提供了一種有條理的方式，使用真實數據和 PaddleOCR 的文本檢測和識別功能來評估 OCR 模型的性能。它計算重要的評估指標，并將結果存儲在指定的輸出文件中，以供進一步分析。

2.注意：真實標簽文件的格式

要使用 DecMain 類和提供的代碼進行 OCR 評估，必須正確格式化真實標簽文件。真實標簽文件應采用 JSON 格式，其結構如下所示：

image_name.jpg [{"transcription": "215mm 18", "points": [[199, 6], [357, 6], [357, 33], [199, 33]], "difficult": False, "key_cls": "digits"}, {"transcription": "XZE SA", "points": [[15, 6], [140, 6], [140, 36], [15, 36]], "difficult": False, "key_cls": "text"}]

真實標簽文件應為 JSON 格式。文件的每一行代表圖像的 OCR 真實標簽。

每一行包含圖像的文件名，后跟 JSON 對象形式的該圖像的 OCR 結果。

JSON 對象應具有以下幾點：

"transcription"：圖像的真實文本轉錄。
"points"：表示圖像中文本區域邊界框坐標的四個點的列表。
"difficult"：一個布爾值，指示文本區域是否難以識別。
"key_cls"：OCR 結果的類別標簽，例如 "digits" 或 "text"。

在創建用于準確評估 OCR 模型性能的真實標簽文件時，請確保遵循此格式。

3.文本識別

代碼定義了一個名為 RecMain 的類，該類旨在使用預訓練的 OCR 模型在圖像文件夾上運行文本識別（OCR）并生成一個評估 Excel 表格。

class RecMain:
    def __init__(self, image_folder, rec_file, output_file):
        self.image_folder = image_folder
        self.rec_file = rec_file
        self.output_file = output_file

    def run_rec(self):
        image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file). \
            get_image_paths_from_folder()

        ocr_model = LoadRecModel().load_model()

        results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images()

        ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict()

        model_predictions, ground_truth_texts, image_names, precision, recall, \
            overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results,
                                                                                            ground_truth_data).evaluate_model()

        # Create Excel sheet
        CreateMetricExcel(image_names, model_predictions, ground_truth_texts,
                          precision, recall, cer_data_list, overall_model_precision, overall_model_recall,
                          self.output_file).create_excel_sheet()

讓我們分解代碼并解釋每個部分：

class RecMain:
    def __init__(self, image_folder, rec_file, output_file):
        self.image_folder = image_folder
        self.rec_file = rec_file
        self.output_file = output_file

RecMain類有一個__init__方法，用以下參數初始化對象：

image_folder: 包含用于文本識別的輸入圖像的文件夾路徑。
rec_file: 包含圖像實際文本內容的地面真實標簽文件的路徑。
output_file: 保存評估結果的輸出Excel表格的文件名。

def run_rec(self):
        image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file).get_image_paths_from_folder()

run_rec方法負責運行文本識別過程。它首先使用GetImagePathsFromFolder類來獲取指定image_folder內所有圖像的圖像路徑列表。這一步確保OCR模型將處理給定目錄內的所有圖像。

ocr_model = LoadRecModel().load_model()

LoadRecModel類用于加載用于文本識別的預訓練OCR模型。它可能使用PaddleOCR或其他OCR庫來加載模型。

results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images()

ProcessImages類負責使用加載的OCR模型來處理圖像。它以OCR模型（ocr_model）和圖像路徑列表（image_paths）作為輸入。

ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict()

ConvertTextToDict類用于讀取地面實況標簽文件并將其轉換為字典格式（ground_truth_data）。這一轉換準備了地面實況數據，以便與OCR模型的預測進行比較。

model_predictions, ground_truth_texts, image_names, precision, recall, \
            overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results,
                                                                                            ground_truth_data).evaluate_model()

EvaluateRecModel類負責將OCR模型的預測與地面實況數據進行比較，并計算評估指標，如精度、召回率和字符錯誤率（CER）。它以OCR模型的預測（results）和地面實況數據（ground_truth_data）作為輸入。

# Create Excel sheet
        CreateMetricExcel(image_names, model_predictions, ground_truth_texts,
                          precision, recall, cer_data_list, overall_model_precision, overall_model_recall,
                          self.output_file).create_excel_sheet()

CreateMetricExcel類負責創建包含評估指標和OCR結果的輸出Excel表。它接受各種輸入數據，包括圖像名稱、模型預測、地面實況文本、評估指標和輸出文件名（self.output_file）。

總之，RecMain類組織了整個文本識別過程，從加載OCR模型到生成包含詳細指標的評估Excel表。它提供了一種有組織和可重復使用的方法，用于評估OCR模型在給定一組圖像上的性能。

注：地面實況文本文件格式

使用RecMain類和提供的代碼進行OCR評估時，正確格式化地面實況（GT）文本文件至關重要。GT文本文件應采用以下格式：

image_name.jpg text

文件的每一行表示一個圖像的GT文本。

每一行包含圖像的文件名，后跟一個制表符（\t），然后是該圖像的GT文本。

確保GT文本文件包含圖像文件夾中指定的所有圖像的GT文本條目。GT文本應與圖像中實際文本內容相匹配。這種格式對于準確評估OCR模型的性能是必需的。

您可以在這里找到源代碼：https://github.com/vinodbaste/paddleOCR_rec_dec?source=post_page

結論

我們探討了如何使用基于深度學習的PaddleOCR進行文本檢測和識別的過程。我們逐步演示了文本檢測和識別的實現。有了PaddleOCR強大的預訓練模型和易于使用的API，對圖像執行OCR變得更加容易。

責任編輯：趙寧寧來源：小白玩轉Python

PaddleOCR 深度學習

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看