You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
575 lines
13 KiB
575 lines
13 KiB
3 months ago
|
# 开发指南
|
||
|
|
||
|
## 项目概述
|
||
|
|
||
|
实时语音转文字系统是一个基于 sherpa-onnx 的语音识别应用,采用模块化设计,支持实时语音识别、断句处理和自动标点功能。
|
||
|
|
||
|
## 技术架构
|
||
|
|
||
|
### 整体架构
|
||
|
|
||
|
```
|
||
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||
|
│ 音频输入层 │───▶│ 音频处理层 │───▶│ 语音识别层 │
|
||
|
│ (麦克风采集) │ │ (预处理/缓冲) │ │ (sherpa-onnx) │
|
||
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||
|
│
|
||
|
▼
|
||
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||
|
│ 输出展示层 │◀───│ 结果处理层 │◀───│ 断句处理层 │
|
||
|
│ (控制台/文件) │ │ (回调/保存) │ │ (标点/断句) │
|
||
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||
|
```
|
||
|
|
||
|
### 核心模块
|
||
|
|
||
|
1. **RealTimeVTT**: 主应用控制器
|
||
|
2. **AudioProcessor**: 音频采集和处理
|
||
|
3. **SpeechRecognizer**: 语音识别引擎
|
||
|
4. **PunctuationProcessor**: 断句和标点处理
|
||
|
5. **ModelDownloader**: 模型管理
|
||
|
|
||
|
## 开发环境设置
|
||
|
|
||
|
### 环境要求
|
||
|
|
||
|
- Python 3.12+
|
||
|
- uv 包管理器
|
||
|
- macOS/Linux 操作系统
|
||
|
- 支持录音的音频设备
|
||
|
|
||
|
### 开发环境安装
|
||
|
|
||
|
```bash
|
||
|
# 克隆项目
|
||
|
git clone <项目地址>
|
||
|
cd realTimeVTT
|
||
|
|
||
|
# 安装开发依赖
|
||
|
uv sync --dev
|
||
|
|
||
|
# 下载模型文件
|
||
|
uv run python main.py --download-model
|
||
|
```
|
||
|
|
||
|
### 开发工具推荐
|
||
|
|
||
|
- **IDE**: PyCharm, VS Code
|
||
|
- **调试**: Python Debugger
|
||
|
- **代码格式化**: black, isort
|
||
|
- **类型检查**: mypy
|
||
|
- **测试**: pytest
|
||
|
|
||
|
## 代码结构
|
||
|
|
||
|
### 目录结构
|
||
|
|
||
|
```
|
||
|
src/
|
||
|
├── __init__.py # 包初始化
|
||
|
├── config.py # 配置管理
|
||
|
├── audio_processor.py # 音频处理
|
||
|
├── speech_recognizer.py # 语音识别
|
||
|
├── punctuation_processor.py # 断句处理
|
||
|
├── realtime_vtt.py # 主应用
|
||
|
└── model_downloader.py # 模型下载
|
||
|
```
|
||
|
|
||
|
### 模块依赖关系
|
||
|
|
||
|
```
|
||
|
RealTimeVTT
|
||
|
├── AudioProcessor
|
||
|
├── SpeechRecognizer
|
||
|
│ ├── PunctuationProcessor
|
||
|
│ └── ModelConfig
|
||
|
├── ModelDownloader
|
||
|
└── AppConfig
|
||
|
```
|
||
|
|
||
|
## 核心模块详解
|
||
|
|
||
|
### 1. RealTimeVTT (主控制器)
|
||
|
|
||
|
**职责**:
|
||
|
- 协调各个模块
|
||
|
- 管理应用生命周期
|
||
|
- 处理用户交互
|
||
|
- 结果输出管理
|
||
|
|
||
|
**关键方法**:
|
||
|
```python
|
||
|
class RealTimeVTT:
|
||
|
def initialize(self) -> bool:
|
||
|
"""初始化所有组件"""
|
||
|
|
||
|
def run_interactive(self):
|
||
|
"""运行交互式会话"""
|
||
|
|
||
|
def _on_result(self, result: RecognitionResult):
|
||
|
"""处理识别结果"""
|
||
|
|
||
|
def _on_partial_result(self, text: str):
|
||
|
"""处理部分识别结果"""
|
||
|
```
|
||
|
|
||
|
### 2. AudioProcessor (音频处理)
|
||
|
|
||
|
**职责**:
|
||
|
- 音频设备管理
|
||
|
- 实时音频采集
|
||
|
- 音频格式转换
|
||
|
- 音频数据缓冲
|
||
|
|
||
|
**关键技术**:
|
||
|
- PyAudio 音频库
|
||
|
- 非阻塞音频流
|
||
|
- 线程安全的数据队列
|
||
|
|
||
|
```python
|
||
|
class AudioProcessor:
|
||
|
def _audio_callback(self, in_data, frame_count, time_info, status):
|
||
|
"""音频回调函数"""
|
||
|
|
||
|
def start_recording(self, callback):
|
||
|
"""开始录音"""
|
||
|
|
||
|
def _record_thread(self):
|
||
|
"""录音线程"""
|
||
|
```
|
||
|
|
||
|
### 3. SpeechRecognizer (语音识别)
|
||
|
|
||
|
**职责**:
|
||
|
- sherpa-onnx 模型管理
|
||
|
- 音频数据识别
|
||
|
- 端点检测
|
||
|
- 结果后处理
|
||
|
|
||
|
**关键技术**:
|
||
|
- 流式识别
|
||
|
- 端点检测算法
|
||
|
- 断句处理集成
|
||
|
|
||
|
```python
|
||
|
class SpeechRecognizer:
|
||
|
def process_audio(self, audio_data: np.ndarray):
|
||
|
"""处理音频数据"""
|
||
|
|
||
|
def _process_partial_result(self, text: str) -> str:
|
||
|
"""处理部分识别结果"""
|
||
|
|
||
|
def _process_final_result(self, text: str) -> str:
|
||
|
"""处理最终识别结果"""
|
||
|
```
|
||
|
|
||
|
### 4. PunctuationProcessor (断句处理)
|
||
|
|
||
|
**职责**:
|
||
|
- 智能断句
|
||
|
- 自动标点
|
||
|
- 语言检测
|
||
|
- 文本优化
|
||
|
|
||
|
**算法特点**:
|
||
|
- 基于规则的断句
|
||
|
- 中英文混合处理
|
||
|
- 上下文感知
|
||
|
|
||
|
## 开发规范
|
||
|
|
||
|
### 代码风格
|
||
|
|
||
|
1. **命名规范**
|
||
|
- 类名: PascalCase
|
||
|
- 函数名: snake_case
|
||
|
- 常量: UPPER_CASE
|
||
|
- 私有方法: _method_name
|
||
|
|
||
|
2. **文档字符串**
|
||
|
```python
|
||
|
def process_audio(self, audio_data: np.ndarray) -> None:
|
||
|
"""
|
||
|
处理音频数据并进行语音识别
|
||
|
|
||
|
Args:
|
||
|
audio_data: 音频数据数组
|
||
|
|
||
|
Returns:
|
||
|
None
|
||
|
|
||
|
Raises:
|
||
|
RuntimeError: 当识别器未初始化时
|
||
|
"""
|
||
|
```
|
||
|
|
||
|
3. **类型注解**
|
||
|
```python
|
||
|
from typing import List, Dict, Optional, Callable
|
||
|
|
||
|
def set_callback(self, callback: Optional[Callable[[str], None]]) -> None:
|
||
|
self.callback = callback
|
||
|
```
|
||
|
|
||
|
### 错误处理
|
||
|
|
||
|
1. **异常层次**
|
||
|
```python
|
||
|
# 自定义异常
|
||
|
class VTTError(Exception):
|
||
|
"""VTT系统基础异常"""
|
||
|
|
||
|
class AudioError(VTTError):
|
||
|
"""音频相关异常"""
|
||
|
|
||
|
class RecognitionError(VTTError):
|
||
|
"""识别相关异常"""
|
||
|
```
|
||
|
|
||
|
2. **错误处理模式**
|
||
|
```python
|
||
|
def initialize(self) -> bool:
|
||
|
try:
|
||
|
self._init_audio()
|
||
|
self._init_recognizer()
|
||
|
return True
|
||
|
except Exception as e:
|
||
|
self.logger.error(f"初始化失败: {e}")
|
||
|
return False
|
||
|
```
|
||
|
|
||
|
### 日志规范
|
||
|
|
||
|
```python
|
||
|
import logging
|
||
|
|
||
|
class MyClass:
|
||
|
def __init__(self):
|
||
|
self.logger = logging.getLogger(__name__)
|
||
|
|
||
|
def process(self):
|
||
|
self.logger.info("开始处理")
|
||
|
try:
|
||
|
# 处理逻辑
|
||
|
self.logger.debug("处理详情")
|
||
|
except Exception as e:
|
||
|
self.logger.error(f"处理失败: {e}")
|
||
|
```
|
||
|
|
||
|
## 测试指南
|
||
|
|
||
|
### 测试结构
|
||
|
|
||
|
```
|
||
|
tests/
|
||
|
├── __init__.py
|
||
|
├── test_audio_processor.py
|
||
|
├── test_speech_recognizer.py
|
||
|
├── test_realtime_vtt.py
|
||
|
├── fixtures/
|
||
|
│ ├── test_audio.wav
|
||
|
│ └── mock_models/
|
||
|
└── conftest.py
|
||
|
```
|
||
|
|
||
|
### 单元测试示例
|
||
|
|
||
|
```python
|
||
|
import pytest
|
||
|
import numpy as np
|
||
|
from src.audio_processor import AudioProcessor
|
||
|
from src.config import AudioConfig
|
||
|
|
||
|
class TestAudioProcessor:
|
||
|
def setup_method(self):
|
||
|
self.config = AudioConfig()
|
||
|
self.processor = AudioProcessor(self.config)
|
||
|
|
||
|
def test_initialization(self):
|
||
|
assert self.processor.initialize()
|
||
|
|
||
|
def test_device_list(self):
|
||
|
devices = self.processor.get_device_list()
|
||
|
assert isinstance(devices, list)
|
||
|
assert len(devices) > 0
|
||
|
|
||
|
@pytest.mark.asyncio
|
||
|
async def test_recording(self):
|
||
|
results = []
|
||
|
|
||
|
def callback(data):
|
||
|
results.append(data)
|
||
|
|
||
|
self.processor.start_recording(callback)
|
||
|
# 等待一些数据
|
||
|
await asyncio.sleep(1)
|
||
|
self.processor.stop_recording()
|
||
|
|
||
|
assert len(results) > 0
|
||
|
assert isinstance(results[0], np.ndarray)
|
||
|
```
|
||
|
|
||
|
### 集成测试
|
||
|
|
||
|
```python
|
||
|
def test_full_pipeline():
|
||
|
"""测试完整的语音识别流程"""
|
||
|
app = RealTimeVTT()
|
||
|
assert app.initialize()
|
||
|
|
||
|
# 模拟音频输入
|
||
|
test_audio = load_test_audio("fixtures/test_audio.wav")
|
||
|
|
||
|
results = []
|
||
|
def result_callback(result):
|
||
|
results.append(result)
|
||
|
|
||
|
app.speech_recognizer.set_result_callback(result_callback)
|
||
|
|
||
|
# 处理音频
|
||
|
app.speech_recognizer.process_audio(test_audio)
|
||
|
|
||
|
# 验证结果
|
||
|
assert len(results) > 0
|
||
|
assert results[0].text is not None
|
||
|
```
|
||
|
|
||
|
### 性能测试
|
||
|
|
||
|
```python
|
||
|
import time
|
||
|
import psutil
|
||
|
|
||
|
def test_memory_usage():
|
||
|
"""测试内存使用情况"""
|
||
|
process = psutil.Process()
|
||
|
initial_memory = process.memory_info().rss
|
||
|
|
||
|
app = RealTimeVTT()
|
||
|
app.initialize()
|
||
|
|
||
|
# 运行一段时间
|
||
|
for _ in range(100):
|
||
|
# 模拟处理
|
||
|
time.sleep(0.1)
|
||
|
|
||
|
final_memory = process.memory_info().rss
|
||
|
memory_increase = final_memory - initial_memory
|
||
|
|
||
|
# 内存增长不应超过100MB
|
||
|
assert memory_increase < 100 * 1024 * 1024
|
||
|
```
|
||
|
|
||
|
## 性能优化
|
||
|
|
||
|
### 1. 音频处理优化
|
||
|
|
||
|
```python
|
||
|
# 使用环形缓冲区
|
||
|
class RingBuffer:
|
||
|
def __init__(self, size):
|
||
|
self.size = size
|
||
|
self.buffer = np.zeros(size)
|
||
|
self.write_pos = 0
|
||
|
|
||
|
def write(self, data):
|
||
|
# 高效的环形写入
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
### 2. 识别优化
|
||
|
|
||
|
```python
|
||
|
# 批量处理
|
||
|
def process_batch(self, audio_batch):
|
||
|
"""批量处理音频数据以提高效率"""
|
||
|
for chunk in audio_batch:
|
||
|
self.recognizer.decode_stream(self.stream)
|
||
|
```
|
||
|
|
||
|
### 3. 内存优化
|
||
|
|
||
|
```python
|
||
|
# 对象池模式
|
||
|
class ResultPool:
|
||
|
def __init__(self, size=100):
|
||
|
self.pool = [RecognitionResult("", 0) for _ in range(size)]
|
||
|
self.index = 0
|
||
|
|
||
|
def get_result(self):
|
||
|
result = self.pool[self.index]
|
||
|
self.index = (self.index + 1) % len(self.pool)
|
||
|
return result
|
||
|
```
|
||
|
|
||
|
## 调试技巧
|
||
|
|
||
|
### 1. 音频调试
|
||
|
|
||
|
```python
|
||
|
# 保存音频数据用于调试
|
||
|
def debug_save_audio(self, audio_data, filename):
|
||
|
import wave
|
||
|
with wave.open(filename, 'wb') as wf:
|
||
|
wf.setnchannels(1)
|
||
|
wf.setsampwidth(2)
|
||
|
wf.setframerate(16000)
|
||
|
wf.writeframes(audio_data.tobytes())
|
||
|
```
|
||
|
|
||
|
### 2. 识别调试
|
||
|
|
||
|
```python
|
||
|
# 详细的识别日志
|
||
|
def process_audio_debug(self, audio_data):
|
||
|
self.logger.debug(f"处理音频: {len(audio_data)} 样本")
|
||
|
|
||
|
if self.recognizer.is_ready(self.stream):
|
||
|
self.logger.debug("识别器就绪")
|
||
|
|
||
|
result = self.recognizer.get_result(self.stream)
|
||
|
self.logger.debug(f"识别结果: '{result}'")
|
||
|
```
|
||
|
|
||
|
### 3. 性能分析
|
||
|
|
||
|
```python
|
||
|
import cProfile
|
||
|
import pstats
|
||
|
|
||
|
def profile_recognition():
|
||
|
profiler = cProfile.Profile()
|
||
|
profiler.enable()
|
||
|
|
||
|
# 运行识别代码
|
||
|
app = RealTimeVTT()
|
||
|
app.run_for_duration(60) # 运行60秒
|
||
|
|
||
|
profiler.disable()
|
||
|
stats = pstats.Stats(profiler)
|
||
|
stats.sort_stats('cumulative')
|
||
|
stats.print_stats(20) # 显示前20个函数
|
||
|
```
|
||
|
|
||
|
## 扩展开发
|
||
|
|
||
|
### 添加新功能
|
||
|
|
||
|
1. **新的音频格式支持**
|
||
|
```python
|
||
|
class AudioFormatConverter:
|
||
|
def convert_to_16khz_mono(self, audio_data, source_rate):
|
||
|
# 格式转换逻辑
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
2. **新的输出格式**
|
||
|
```python
|
||
|
class JSONOutputHandler:
|
||
|
def save_result(self, result: RecognitionResult):
|
||
|
# JSON格式保存
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
3. **新的识别模型**
|
||
|
```python
|
||
|
class ModelAdapter:
|
||
|
def adapt_model(self, model_path):
|
||
|
# 模型适配逻辑
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
### 插件系统设计
|
||
|
|
||
|
```python
|
||
|
class PluginManager:
|
||
|
def __init__(self):
|
||
|
self.plugins = {}
|
||
|
|
||
|
def register_plugin(self, name, plugin):
|
||
|
self.plugins[name] = plugin
|
||
|
|
||
|
def call_plugin(self, name, *args, **kwargs):
|
||
|
if name in self.plugins:
|
||
|
return self.plugins[name](*args, **kwargs)
|
||
|
```
|
||
|
|
||
|
## 部署指南
|
||
|
|
||
|
### 生产环境配置
|
||
|
|
||
|
```python
|
||
|
# production_config.py
|
||
|
class ProductionConfig(ModelConfig):
|
||
|
# 生产环境优化参数
|
||
|
num_threads = 4
|
||
|
enable_endpoint = True
|
||
|
log_level = "WARNING"
|
||
|
```
|
||
|
|
||
|
### Docker 部署
|
||
|
|
||
|
```dockerfile
|
||
|
FROM python:3.12-slim
|
||
|
|
||
|
# 安装系统依赖
|
||
|
RUN apt-get update && apt-get install -y \
|
||
|
portaudio19-dev \
|
||
|
&& rm -rf /var/lib/apt/lists/*
|
||
|
|
||
|
# 安装Python依赖
|
||
|
COPY requirements.txt .
|
||
|
RUN pip install -r requirements.txt
|
||
|
|
||
|
# 复制应用代码
|
||
|
COPY . /app
|
||
|
WORKDIR /app
|
||
|
|
||
|
# 下载模型
|
||
|
RUN python main.py --download-model
|
||
|
|
||
|
CMD ["python", "main.py"]
|
||
|
```
|
||
|
|
||
|
### 监控和日志
|
||
|
|
||
|
```python
|
||
|
# 添加监控指标
|
||
|
class MetricsCollector:
|
||
|
def __init__(self):
|
||
|
self.recognition_count = 0
|
||
|
self.error_count = 0
|
||
|
self.avg_latency = 0
|
||
|
|
||
|
def record_recognition(self, latency):
|
||
|
self.recognition_count += 1
|
||
|
self.avg_latency = (self.avg_latency + latency) / 2
|
||
|
```
|
||
|
|
||
|
## 贡献指南
|
||
|
|
||
|
### 提交代码
|
||
|
|
||
|
1. Fork 项目
|
||
|
2. 创建功能分支: `git checkout -b feature/new-feature`
|
||
|
3. 提交更改: `git commit -am 'Add new feature'`
|
||
|
4. 推送分支: `git push origin feature/new-feature`
|
||
|
5. 创建 Pull Request
|
||
|
|
||
|
### 代码审查
|
||
|
|
||
|
- 确保所有测试通过
|
||
|
- 代码覆盖率 > 80%
|
||
|
- 遵循代码规范
|
||
|
- 添加必要的文档
|
||
|
|
||
|
### 发布流程
|
||
|
|
||
|
1. 更新版本号
|
||
|
2. 更新 CHANGELOG
|
||
|
3. 创建 Release Tag
|
||
|
4. 构建和发布包
|
||
|
|
||
|
---
|
||
|
|
||
|
本开发指南涵盖了项目的主要开发方面。如有疑问,请参考源代码或提交 Issue。
|