C# 使用Vosk離線語音轉文字完整實現指南

作者：iamrick 2024-11-29 07:45:38

本文介紹了如何使用 Vosk 和 NAudio 庫實現語音轉文字的功能，支持 MP3 和 WAV 格式的音頻輸入，并自動將 MP3 轉換為 WAV 格式，同時對音頻進行重采樣至 16kHz，以滿足 Vosk 的要求。

1. 項目準備

首先需要安裝必要的 NuGet 包：

<PackageReference Include="Vosk" Version="0.3.38" />
<PackageReference Include="NAudio" Version="2.2.1" />

圖片

2. 下載語音模型

訪問 Vosk 模型下載頁面

https://alphacephei.com/vosk/models

圖片

下載中文模型中文或其他語言模型

圖片

解壓模型文件到項目目錄下的 Models 文件夾

3. 完整代碼實現

using NAudio.Wave;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using Vosk;

namespace AppVosk
{
    public class SpeechToTextConverter
    {
        private readonly string _modelPath;

        public SpeechToTextConverter(string modelPath)
        {
            _modelPath = modelPath;
            // 初始化 Vosk
            Vosk.Vosk.SetLogLevel(0);
            if (!Directory.Exists(_modelPath))
            {
                throw new DirectoryNotFoundException($"請確保模型文件夾存在: {_modelPath}");
            }
        }

        public async Task<string> ConvertToText(string audioFilePath)
        {
            if (!File.Exists(audioFilePath))
            {
                throw new FileNotFoundException($"音頻文件不存在: {audioFilePath}");
            }

            // 將音頻轉換為 WAV 格式（如果是 MP3）
            string wavFile = audioFilePath;
            bool needsDisposal = false;

            if (Path.GetExtension(audioFilePath).ToLower() == ".mp3")
            {
                wavFile = Path.Combine(Path.GetTempPath(), Path.GetFileNameWithoutExtension(audioFilePath) + ".wav");
                using (var reader = new Mp3FileReader(audioFilePath))
                using (var writer = new WaveFileWriter(wavFile, reader.WaveFormat))
                {
                    reader.CopyTo(writer);
                }
                needsDisposal = true;
            }

            try
            {
                using (var model = new Model(_modelPath))
                using (var recognizer = new VoskRecognizer(model, 16000.0f))
                using (var waveStream = new WaveFileReader(wavFile))
                {
                    // 重采樣到 16kHz (如果需要)
                    var outFormat = new WaveFormat(16000, 1);
                    using (var resampler = new MediaFoundationResampler(waveStream, outFormat))
                    {
                        byte[] buffer = new byte[4096];
                        int bytesRead;

                        while ((bytesRead = resampler.Read(buffer, 0, buffer.Length)) > 0)
                        {
                            if (recognizer.AcceptWaveform(buffer, bytesRead))
                            {
                                // 處理中間結果（如果需要）
                            }
                        }
                    }

                    // 獲取最終識別結果
                    var result = JsonDocument.Parse(recognizer.FinalResult());
                    return result.RootElement.GetProperty("text").GetString() ?? string.Empty;
                }
            }
            finally
            {
                // 清理臨時 WAV 文件
                if (needsDisposal && File.Exists(wavFile))
                {
                    try
                    {
                        File.Delete(wavFile);
                    }
                    catch { /* 忽略刪除失敗 */ }
                }
            }
        }
    }
}

4. 使用示例

圖片

class Program
{
    static async Task Main(string[] args)
    {
        try
        {
            // 指定模型路徑和音頻文件路徑
            string modelPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Models", "vosk-model-small-cn-0.22");
            string audioFile = "test.mp3";  // 或 test.wav

            var converter = new SpeechToTextConverter(modelPath);
            string result = await converter.ConvertToText(audioFile);

            Console.WriteLine("識別結果:");
            Console.WriteLine(result);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"發生錯誤: {ex.Message}");
        }
    }
}

圖片

我下載的是1.3G的模型，這個速度會有點慢。

5. 主要功能說明

格式轉換：支持 MP3 和 WAV 格式的輸入，自動將 MP3 轉換為 WAV
重采樣：自動將音頻重采樣至 16kHz，以符合 Vosk 的要求
錯誤處理：包含完整的錯誤處理和資源清理
內存優化：使用流式處理，適合處理大文件
臨時文件管理：自動清理轉換過程中產生的臨時文件

6. 總結

本文介紹了如何使用 Vosk 和 NAudio 庫實現語音轉文字的功能，支持 MP3 和 WAV 格式的音頻輸入，并自動將 MP3 轉換為 WAV 格式，同時對音頻進行重采樣至 16kHz，以滿足 Vosk 的要求。文章詳細講解了代碼實現，包括模型文件的加載、音頻格式轉換、重采樣處理以及最終的語音識別流程。通過將模型加載放在應用程序啟動時，可以提高識別效率，避免重復加載模型。整體方案簡單高效，適合需要實現語音識別功能的開發者參考和使用。

責任編輯：武曉燕來源：技術老小子

C#離線語音文字

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看