Kocom SenseVoice Sohee (ASR)
Model Summary
This is a SenseVoiceSmall model fine-tuned for Korean ASR using synthetic speech generated with Qwen3-TTS CustomVoice (speaker: Sohee). The training data is derived from internal intent/assistant dialogue texts and rendered into audio with a single synthetic speaker.
Base Model
- iic/SenseVoiceSmall
Training Data
Text sources:
test/data/intent_classify/community_train/output_community_dataset.jsontest/data/intent_classify/run_20260121_141638/output_vui_dataset.json
Audio:
- Generated via Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
- Speaker: Sohee
- Language: Korean
- 16 kHz, mono, PCM_16
Counts (from manifest):
- Total items: 8076
- Roles: user 5068, assistant 3008
Split:
- Train/val = 95/5 random split (seed=42)
Training Procedure
- Fine-tuning with FunASR
train_ds.py - Max epochs: 50
- Batch type: token, batch_size=6000
- Learning rate: 2e-4
- Deepspeed: disabled
Evaluation
No public benchmark results are reported. Evaluate on your own test set to validate quality for your domain.
Intended Use
Korean ASR for the specific synthetic Sohee voice and domain-style commands. Best suited for controlled or synthetic audio similar to the training data.
Limitations
- Trained on synthetic speech from a single voice; generalization to real-world speech, accents, noise, or other speakers is limited.
- Domain coverage is restricted to the intent texts used during training.
Usage (Local)
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model = AutoModel(
model="/data/sapie/tax/kocom/SenseVoice/exp_kocom_sohee",
trust_remote_code=True,
remote_code="/data/sapie/tax/kocom/SenseVoice/model.py",
device="cuda:0",
)
res = model.generate(
input="/path/to/audio.wav",
cache={},
language="ko",
use_itn=False,
batch_size_s=60,
merge_vad=True,
merge_length_s=15,
)
print(rich_transcription_postprocess(res[0]["text"]))
License
Not specified. Please add your intended license before publishing.
Contact
Add contact or maintainer information here.
- Downloads last month
- 1