Rewrite Chinese README for one-audio local inference and add single-audio script

2dd9831 verified 2 days ago

3.57 kB

	---
	tags:
	- speech
	- whisper
	- forced-alignment
	- pronunciation-assessment
	- gopt
	license: other
	---

	# custom-gopt-252-eval

	这个仓库只做一件事：把本地保存的整套评测必需模型集中起来，并提供一个“输入 1 个音频，直接输出整体评分”的最短用法。

	整体评分看 `total`，分值范围是 `0` 到 `5`。

	## 1. 下载模型包

	```bash
	python - <<'PY'
	from huggingface_hub import snapshot_download

	snapshot_download(
	repo_id="faeea/custom-gopt-252-eval",
	repo_type="model",
	local_dir="./hf_models/custom-gopt-252-eval",
	)
	PY
	```

	下载后约定：

	```bash
	export BUNDLE_DIR=$PWD/hf_models/custom-gopt-252-eval
	```

	## 2. 下载运行代码

	这个模型包不是一个单独的 Transformers 模型；本地推理还要依赖 `custom-gopt` 的模型定义，以及 `Charsiu` 的文本到音素处理代码。

	```bash
	git clone https://github.com/hf49w/custom-gopt.git
	git clone https://github.com/lingjzhu/charsiu third_party/charsiu_repo
	git -C third_party/charsiu_repo checkout 13a69f2a22ca0c0962b75cc693399b0ae23a12c9
	```

	## 3. 安装最小依赖

	在 `custom-gopt` 仓库根目录执行：

	```bash
	pip install -r requirements.txt
	python -m pip install nltk
	python -m nltk.downloader cmudict averaged_perceptron_tagger averaged_perceptron_tagger_eng
	```

	如果你已经有可用环境，这一步只要保证下面这些包能导入即可：

	- `torch`
	- `transformers`
	- `librosa`
	- `soundfile`
	- `g2p_en`
	- `g2pM`
	- `praatio`
	- `nltk`

	## 4. 准备一个待评分音频

	要求尽量简单：

	- 英语单句或单段短语音频
	- `wav` 最稳妥
	- 单声道更好
	- `16kHz` 最理想，不是 `16kHz` 也会在脚本里自动重采样

	假设你的音频路径是：

	```bash
	export AUDIO_PATH=/path/to/demo.wav
	```

	## 5. 直接跑单音频整体评分

	在 `custom-gopt` 仓库根目录执行：

	```bash
	python "$BUNDLE_DIR/examples/infer_one_audio.py" \
	--audio "$AUDIO_PATH" \
	--bundle-dir "$BUNDLE_DIR" \
	--repo-root "$PWD" \
	--charsiu-src-dir "$PWD/../third_party/charsiu_repo" \
	--device cuda \
	--output-json ./one_audio_score.json
	```

	如果你没有 GPU，把 `--device cuda` 改成：

	```bash
	--device cpu
	```

	## 6. 成功运行后会输出什么

	脚本会在终端打印一段 JSON，并在 `--output-json` 指定的位置写出同样内容。

	典型输出结构如下：

	```json
	{
	"status": "ok",
	"audio_path": "/path/to/demo.wav",
	"transcript": "she had your dark suit in greasy wash water all year",
	"utterance_scores": {
	"accuracy": 3.91,
	"completeness": 4.12,
	"fluency": 3.66,
	"prosodic": 3.58,
	"total": 3.82
	},
	"overall_score": 3.82
	}
	```

	其中：

	- `overall_score` 就是整体评分
	- `overall_score` 和 `utterance_scores.total` 是同一个值
	- 脚本会把最终分数裁到 `0-5` 区间，便于直接使用

	## 7. 最短验证命令

	如果你只是想先确认“模型能不能在本地跑通”，可以直接拿 `Charsiu` 仓库自带的示例音频试一下：

	```bash
	python "$BUNDLE_DIR/examples/infer_one_audio.py" \
	--audio "$PWD/../third_party/charsiu_repo/local/SA1.WAV" \
	--bundle-dir "$BUNDLE_DIR" \
	--repo-root "$PWD" \
	--charsiu-src-dir "$PWD/../third_party/charsiu_repo" \
	--device cpu
	```

	## 8. 说明

	- 这个脚本走的是：`Whisper -> Charsiu -> Streaming GOPT`
	- 输入 1 个音频，不需要你自己提供文本
	- 模型会先自动识别文本，再对齐音素，最后输出整体评分
	- 训练数据是 SpeechOcean762，所以更适合英语学习者短句朗读场景