ASLP-lab
/

WSChuan-ASR

Automatic Speech Recognition

Model card Files Files and versions

WSChuan-ASR / README.md

ASLP-lab's picture

Update README.md

4e2199b verified 5 months ago

|

3.31 kB

	---
	language: zh
	license: apache-2.0
	tags:
	- automatic-speech-recognition
	- ASR
	- chinese
	- speech
	---

	## 📂 Project Tree
	# （以下保留你原有的内容）

	## 📂 Project Tree

	```
	WSChuan-ASR
	├── paraformer_large_chuan/
	│ ├── config.yaml
	│ ├── model.pt
	│ └── infer.py
	│
	├── Qwen2.5-omni3B/
	\| ├──added_tokens.json
	\| ├──args.json
	\| ├──char_template.jinja
	\| ├──config.json
	\| ├──generation_config.json
	\| ├──merges.txt
	\| ├──model-00001-of-00003.safetensors
	\| ├──model-00002-of-00003.safetensors
	\| ├──model-00003-of-00003.safetensors
	\| ├──model.safetensors.index.json
	\| ├──preprocessor_config.json
	\| ├──special_tokens_map.json
	\| ├──spk_dict.pt
	\| ├──tokenizer_config.json
	\| ├──tokenizer.json
	\| ├──video_preprocessor_config.json
	\| └──vocab.json
	│
	├── .gitattributes
	└── README.md
	```

	## ASR Leaderboard
	\| Model \| Model Size \| WSC-Eval-ASR - Easy \| WSC-Eval-ASR - Hard \| WSC-Eval-ASR - Total \| Magicdata - Conversation \| Magicdata - Daily-Use \| Avg. \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| with LLM \| \| \| \| \| \| \| \|
	\| Kimi-Audio<sup></sup> \| 7B \| 16.65 \| 28.66 \| 17.66 \| 24.67 \| 5.77 \| 18.68 \|
	\| FireRedASR-LLM<sup></sup> \| 8.3B \| 12.80 \| 25.27 \| 14.40 \| 17.68 \| 6.69 \| 15.37 \|
	\| Qwen2.5-omni<sup></sup> \| 3B \| 16.94 \| 26.01 \| 18.20 \| 20.40 \| 6.32 \| 17.69 \|
	\| Qwen2.5-omni-WSC-Finetune⭐ \| 3B \| 14.36 \| 24.14 \| 15.61 \| 18.45 \| 6.15 \| 15.74 \|
	\| <span style="background-color: #d4edda; padding: 0 2px;">Qwen2.5-omni+internal data⭐</span> \| 3B \| 13.17 \| 23.36 \| 14.81 \| 18.50 \| 5.88 \| 15.14 \|
	\| <span style="background-color: #d4edda; padding: 0 2px;">Qwen2.5-omni-WSC-Finetune + internal data⭐</span> \| 3B \| 12.93 \| 23.19 \| 14.25 \| 17.95 \| <u>5.89</u> \| 14.84 \|
	\| without LLM \| \| \| \| \| \| \| \|
	\| SenseVoice-small<sup></sup> \| 234M \| 17.43 \| 28.38 \| 18.39 \| 23.50 \| 8.77 \| 19.29 \|
	\| Whisper<sup></sup> \| 244M \| 52.06 \| 63.99 \| 53.59 \| 55.88 \| 52.03 \| 55.51 \|
	\| FireRedASR-AED<sup></sup> \| 1.1B \| 13.29 \| 23.64 \| 14.62 \| 17.84 \| 6.69 \| 15.14 \|
	\| Paraformer<sup></sup> \| 220M \| 14.34 \| 24.61 \| 15.66 \| 19.81 \| 8.16 \| 16.52 \|
	\| Paraformer-WSC-Finetune⭐ \| 220M \| 12.15 \| 22.60 \| 13.51 \| 16.60 \| 8.02 \| 14.58 \|
	\| <span style="background-color: #d4edda; padding: 0 2px;">Paraformer + internal data⭐</span> \| 220M \| <u>11.93</u> \| <u>21.82</u> \| <u>13.14</u> \| <u>15.61</u> \| 6.77 \| <u>13.85</u> \|
	\| <span style="background-color: #d4edda; padding: 0 2px;">Paraformer-WSC-Finetune + internal data</span>⭐ \| 220M \| 11.59 \| 21.59 \| 12.87 \| 14.59 \| 6.28 \| 13.38 \|



	## ASR Inference
	### Paraformer_large_Chuan
	```
	export CUDA_VISIBLE_DEVICES=7
	root_dir=./test_data
	test_sets=("WSC-Eval-ASR" "WSC-Eval-ASR-Hard" "WSC-Eval-ASR-Easy")
	model_dir=./model_dir

	out_rootdir=./results
	mkdir -p $out_rootdir
	python infer_paraformer.py \
	--model $model_dir \
	--wav_scp_file $root_dir/$test_data/wav.scp \
	--output_dir $out_rootdir/debug \
	--device "cuda" \
	--output_file $out_dir/hyp.txt
	```
	---

	### Qwen2.5-Omni-3B_Chuan
	```
	python infer_qwen2.5omni.py \
	--wavs_path /path/to/your/wav.scp \
	--out_path /path/to/your/results.txt \
	--gpu 0 \
	--model /path/to/your/model
	```