Whisper JA-ZH Tiny

A fine-tuned OpenAI Whisper tiny model on Japanese-to-Chinese speech translation, trained on a subset of the DataLabX/ScreenTalk_JA2ZH dataset.

📌 Model Details

Base model: openai/whisper-tiny
Task: Speech translation (Japanese → Chinese)
Dataset: ScreenTalk-JA2ZH (private subset)
Training framework: 🤗 Transformers + Seq2SeqTrainer
Hardware: RTX 5090
Mixed Precision: FP16 enabled
Total Training Epochs: Early-stopped at 11 epochs
Eval BLEU: 0.757 on held-out eval set, 0.609 on held-out test set.

🏃 Training Configuration

train_batch_size: 96
eval_batch_size: 64
learning_rate: 3e-4
warmup_steps: 1000
num_train_epochs: 20
gradient_accumulation_steps: 1
save_steps: 1000
eval_steps: 1000
logging_steps: 1000
fp16: true
eval_strategy: step
early_stopping: enabled (patience=5)

Best checkpoint auto-loaded via load_best_model_at_end=True using eval_bleu as the metric.

📈 Test Dataset

Final run metrics (test set):

loss: 2.3245
bleu: 0.6095

📁 Structure

Repository includes:

config.json, generation_config.json, preprocessor_config.json
Tokenizer: tokenizer_config.json, vocab.json, merges.txt, etc.
Training log: training_20250610-194336.log
TensorBoard logs: runs/

🚀 How to Use

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("fj11/whisper-ja-zh-tiny")
model = WhisperForConditionalGeneration.from_pretrained("fj11/whisper-ja-zh-tiny")

📬 Contact

For business inquiries or collaboration, visit https://www.itbanque.com or reach out via Hugging Face.

📜 License

CC BY-NC-SA 4.0 (Non-commercial, Attribution, ShareAlike)

Downloads last month: 8

Safetensors

Model size

37.8M params

Tensor type

F32

Itbanque
/

whisper-ja-zh-tiny