Nasanbuyan commited on
Commit
9dad500
·
verified ·
1 Parent(s): 571cf00

Upload Mongolian Whisper model

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +68 -0
  3. config.json +21 -0
  4. model.pt +3 -0
  5. vocab.json +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ vocab.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - mn
4
+ license: apache-2.0
5
+ tags:
6
+ - audio
7
+ - speech-recognition
8
+ - whisper
9
+ - mongolian
10
+ datasets:
11
+ - mozilla-foundation/common_voice_11_0
12
+ ---
13
+
14
+ # Whisper Mongolian ASR Model
15
+
16
+ This is a custom-trained Whisper model for Mongolian speech recognition, based on the implementation in [whisper.py](https://github.com/your-username/whisper-mongolian).
17
+
18
+ ## Model Details
19
+
20
+ - **Architecture:** Custom Whisper-like model trained from scratch
21
+ - **Training Data:** Mozilla Common Voice Mongolian dataset
22
+ - **Performance Metrics:**
23
+ - Word Error Rate (WER): 0.9277985118418891
24
+ - Character Error Rate (CER): 0.7262371117301725
25
+
26
+ ## Usage
27
+
28
+ To use this model, you'll need to download the `model.pt` file and use it with the original implementation code:
29
+
30
+ ```python
31
+ import torch
32
+ from whisper import WhisperConfig, WhisperModel, SimpleTokenizer
33
+
34
+ # Load the model
35
+ checkpoint = torch.load("model.pt")
36
+
37
+ # Create config
38
+ config = WhisperConfig()
39
+ for k, v in checkpoint['config'].items():
40
+ if not callable(v) and k != "tokenizer":
41
+ setattr(config, k, v)
42
+
43
+ # Create tokenizer
44
+ tokenizer = SimpleTokenizer()
45
+ tokenizer.load_vocab("vocab.json") # Make sure to download vocab.json as well
46
+ config.tokenizer = tokenizer
47
+
48
+ # Create model
49
+ model = WhisperModel(config)
50
+ model.load_state_dict(checkpoint['model_state_dict'])
51
+ model.eval()
52
+
53
+ # Now you can use the model for inference
54
+ ```
55
+
56
+ ## Citation
57
+
58
+ If you use this model, please cite:
59
+
60
+ ```
61
+ @misc{whisper-mongolian,
62
+ author = {Your Name},
63
+ title = {Whisper Mongolian ASR Model},
64
+ year = {2025},
65
+ publisher = {Hugging Face},
66
+ howpublished = {\url{https://huggingface.co/Nasanbuyan/whisper-mongolian}}
67
+ }
68
+ ```
config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sampling_rate": 16000,
3
+ "n_fft": 400,
4
+ "hop_length": 160,
5
+ "n_mels": 80,
6
+ "d_model": 384,
7
+ "n_heads": 6,
8
+ "n_layers": 4,
9
+ "vocab_size": 1000,
10
+ "batch_size": 16,
11
+ "learning_rate": 0.0003,
12
+ "weight_decay": 0.01,
13
+ "max_epochs": 20,
14
+ "warmup_steps": 1000,
15
+ "grad_clip": 1.0,
16
+ "max_audio_length": 30.0,
17
+ "max_text_length": 448,
18
+ "data_dir": "./whisper/data",
19
+ "checkpoint_dir": "./whisper/checkpoints",
20
+ "tensorboard_dir": "./whisper/logs"
21
+ }
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf70c92ad5eaf426d942d077692f65230e3b4cc3e2fd6e18bd6c5300dc4f6c84
3
+ size 240546907
vocab.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:098563d5487891b78b092d7fce61240f6fcfe0da2d2dc318c2fc5dfdd6ff4cbd
3
+ size 16899880