🎤 Thai Speech 20K — Private Dataset for VibeVoice Fine-tuning

ไทย/English — ชุดข้อมูลเสียงพูดภาษาไทย 20,000 ประโยค สำหรับ fine-tune โมเดล TTS

🇹🇭 Dataset เสียงภาษาไทย 20,000 ตัวอย่าง ผู้พูด 1 คน
🇬🇧 20,000 Thai speech utterances, single speaker


📊 Dataset Card

Field Detail
Name Thai Speech 20K
Language 🇹🇭 Thai
Samples 20,000 utterances
Total Duration ~11 hours
Speaker Single speaker (conditioned)
Audio Format WAV, 24 kHz mono
Text Format UTF-8 transcripts (JSONL)
Split Train only (no public eval split)
License CC-BY-4.0 (metadata only; audio is private)
Access 🔒 Private — audio not included in this repo

📋 Data Format

JSONL with fields:
  text  — Thai text transcription (UTF-8)
  audio — Path to WAV file (24 kHz, mono)

Example:

{"text": "สวัสดีครับ วันนี้อากาศดีมาก", "audio": "audio/sample_000000.wav"}

🎯 Intended Use

This dataset was created specifically for fine-tuning the microsoft/VibeVoice-1.5B model for Thai text-to-speech with speaker conditioning. It is intended for:

  • Text-to-Speech (TTS): Fine-tuning TTS models for Thai language
  • Speaker Adaptation: Single-speaker voice cloning/personalization
  • Low-resource TTS: Thai TTS research with limited data

⚠️ Limitations

  • Single Speaker: Only one speaker — may not generalize to multi-speaker scenarios
  • Private Audio: Audio files are not publicly available for privacy reasons
  • Domain: General conversational Thai only — no domain-specific vocabulary
  • No Evaluation Split: All 20k samples used for training; evaluation done separately

📎 Related Models


🙏 Credits

  • Collected by: UKA
  • Purpose: Fine-tuning VibeVoice for Thai TTS
  • Timestamp: 2026
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support