🎤 Thai Speech 20K — Private Dataset for VibeVoice Fine-tuning

ไทย/English — ชุดข้อมูลเสียงพูดภาษาไทย 20,000 ประโยค สำหรับ fine-tune โมเดล TTS

🇹🇭 Dataset เสียงภาษาไทย 20,000 ตัวอย่าง ผู้พูด 1 คน
🇬🇧 20,000 Thai speech utterances, single speaker

📊 Dataset Card

Field	Detail
Name	Thai Speech 20K
Language	🇹🇭 Thai
Samples	20,000 utterances
Total Duration	~11 hours
Speaker	Single speaker (conditioned)
Audio Format	WAV, 24 kHz mono
Text Format	UTF-8 transcripts (JSONL)
Split	Train only (no public eval split)
License	CC-BY-4.0 (metadata only; audio is private)
Access	🔒 Private — audio not included in this repo

📋 Data Format

JSONL with fields:
  text  — Thai text transcription (UTF-8)
  audio — Path to WAV file (24 kHz, mono)

Example:

{"text": "สวัสดีครับ วันนี้อากาศดีมาก", "audio": "audio/sample_000000.wav"}

🎯 Intended Use

This dataset was created specifically for fine-tuning the microsoft/VibeVoice-1.5B model for Thai text-to-speech with speaker conditioning. It is intended for:

Text-to-Speech (TTS): Fine-tuning TTS models for Thai language
Speaker Adaptation: Single-speaker voice cloning/personalization
Low-resource TTS: Thai TTS research with limited data

⚠️ Limitations

Single Speaker: Only one speaker — may not generalize to multi-speaker scenarios
Private Audio: Audio files are not publicly available for privacy reasons
Domain: General conversational Thai only — no domain-specific vocabulary
No Evaluation Split: All 20k samples used for training; evaluation done separately

📎 Related Models

hotdogs/vibevoice-1.5b-thai-tts-lora — LoRA adapter trained on this dataset

🙏 Credits

Collected by: UKA
Purpose: Fine-tuning VibeVoice for Thai TTS
Timestamp: 2026

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support