Upload Mongolian Whisper model

Browse files

Files changed (5) hide show

.gitattributes +1 -0
README.md +68 -0
config.json +21 -0
model.pt +3 -0
vocab.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+language:
+- mn
+license: apache-2.0
+tags:
+- audio
+- speech-recognition
+- whisper
+- mongolian
+datasets:
+- mozilla-foundation/common_voice_11_0
+---
+# Whisper Mongolian ASR Model
+This is a custom-trained Whisper model for Mongolian speech recognition, based on the implementation in [whisper.py](https://github.com/your-username/whisper-mongolian).
+## Model Details
+- **Architecture:** Custom Whisper-like model trained from scratch
+- **Training Data:** Mozilla Common Voice Mongolian dataset
+- **Performance Metrics:**
+  - Word Error Rate (WER): 0.9277985118418891
+  - Character Error Rate (CER): 0.7262371117301725
+## Usage
+To use this model, you'll need to download the `model.pt` file and use it with the original implementation code:
+```python
+import torch
+from whisper import WhisperConfig, WhisperModel, SimpleTokenizer
+# Load the model
+checkpoint = torch.load("model.pt")
+# Create config
+config = WhisperConfig()
+for k, v in checkpoint['config'].items():
+    if not callable(v) and k != "tokenizer":
+        setattr(config, k, v)
+# Create tokenizer
+tokenizer = SimpleTokenizer()
+tokenizer.load_vocab("vocab.json")  # Make sure to download vocab.json as well
+config.tokenizer = tokenizer
+# Create model
+model = WhisperModel(config)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Now you can use the model for inference
+```
+## Citation
+If you use this model, please cite:
+```
+@misc{whisper-mongolian,
+  author = {Your Name},
+  title = {Whisper Mongolian ASR Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/Nasanbuyan/whisper-mongolian}}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "sampling_rate": 16000,
+  "n_fft": 400,
+  "hop_length": 160,
+  "n_mels": 80,
+  "d_model": 384,
+  "n_heads": 6,
+  "n_layers": 4,
+  "vocab_size": 1000,
+  "batch_size": 16,
+  "learning_rate": 0.0003,
+  "weight_decay": 0.01,
+  "max_epochs": 20,
+  "warmup_steps": 1000,
+  "grad_clip": 1.0,
+  "max_audio_length": 30.0,
+  "max_text_length": 448,
+  "data_dir": "./whisper/data",
+  "checkpoint_dir": "./whisper/checkpoints",
+  "tensorboard_dir": "./whisper/logs"
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf70c92ad5eaf426d942d077692f65230e3b4cc3e2fd6e18bd6c5300dc4f6c84
+size 240546907

vocab.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:098563d5487891b78b092d7fce61240f6fcfe0da2d2dc318c2fc5dfdd6ff4cbd
+size 16899880