Kocom SenseVoice Sohee (ASR)

Model Summary

This is a SenseVoiceSmall model fine-tuned for Korean ASR using synthetic speech generated with Qwen3-TTS CustomVoice (speaker: Sohee). The training data is derived from internal intent/assistant dialogue texts and rendered into audio with a single synthetic speaker.

Base Model

iic/SenseVoiceSmall

Training Data

Text sources:

test/data/intent_classify/community_train/output_community_dataset.json
test/data/intent_classify/run_20260121_141638/output_vui_dataset.json

Audio:

Generated via Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Speaker: Sohee
Language: Korean
16 kHz, mono, PCM_16

Counts (from manifest):

Total items: 8076
Roles: user 5068, assistant 3008

Split:

Train/val = 95/5 random split (seed=42)

Training Procedure

Fine-tuning with FunASR train_ds.py
Max epochs: 50
Batch type: token, batch_size=6000
Learning rate: 2e-4
Deepspeed: disabled

Evaluation

No public benchmark results are reported. Evaluate on your own test set to validate quality for your domain.

Intended Use

Korean ASR for the specific synthetic Sohee voice and domain-style commands. Best suited for controlled or synthetic audio similar to the training data.

Limitations

Trained on synthetic speech from a single voice; generalization to real-world speech, accents, noise, or other speakers is limited.
Domain coverage is restricted to the intent texts used during training.

Usage (Local)

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(
    model="/data/sapie/tax/kocom/SenseVoice/exp_kocom_sohee",
    trust_remote_code=True,
    remote_code="/data/sapie/tax/kocom/SenseVoice/model.py",
    device="cuda:0",
)

res = model.generate(
    input="/path/to/audio.wav",
    cache={},
    language="ko",
    use_itn=False,
    batch_size_s=60,
    merge_vad=True,
    merge_length_s=15,
)
print(rich_transcription_postprocess(res[0]["text"]))

License

Not specified. Please add your intended license before publishing.

Contact

Add contact or maintainer information here.

Downloads last month: 1