mazesmazes commited on
Commit
b943993
·
verified ·
1 Parent(s): 9f0e9f6

Model save

Browse files
Files changed (1) hide show
  1. README.md +83 -69
README.md CHANGED
@@ -1,73 +1,87 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
- datasets:
6
- - speechbrain/LoquaciousSet
7
- base_model:
8
- - openai/whisper-large-v3-turbo
9
- - HuggingFaceTB/SmolLM3-3B
10
- pipeline_tag: automatic-speech-recognition
11
  tags:
12
- - asr
13
- - speech-recognition
14
- - audio
15
- - smollm
16
- - whisper
17
- - mlp
18
  ---
19
 
20
- # Tiny Audio
21
-
22
- A speech recognition model trained in 24 hours on a single GPU for ~$12. Built with the [Tiny Audio](https://github.com/alexkroman/tiny-audio) codebase—a minimal, hackable framework for training ASR models.
23
-
24
- ## Architecture
25
-
26
- ```
27
- Audio (16kHz) → Whisper Encoder (frozen) → MLP Projector (trained) → SmolLM3-3B (frozen) → Text
28
- ```
29
-
30
- **MLP Projector:**
31
- - Convolutional downsampling: 4x sequence compression via two stride-2 conv layers
32
- - Linear (1280 → 2048) → GELU → Linear (2048 → 2048)
33
- - Output normalization: RMSNorm
34
-
35
- ## Training Details
36
-
37
- | | |
38
- |---|---|
39
- | **Dataset** | LoquaciousSet (25,000 hours) |
40
- | **Hardware** | Single NVIDIA A40 40GB |
41
- | **Training Time** | ~24 hours |
42
- | **Cost** | ~$12 |
43
- | **Trainable Parameters** | ~12M (projector only) |
44
-
45
- ## Performance
46
-
47
- **Word Error Rate (WER): 12.14%** on LoquaciousSet test set.
48
-
49
- See the [community leaderboard](https://github.com/alexkroman/tiny-audio#leaderboard) for comparisons.
50
-
51
- ## Usage
52
-
53
- ```python
54
- from transformers import pipeline
55
-
56
- pipe = pipeline("automatic-speech-recognition", model="mazesmazes/tiny-audio", trust_remote_code=True)
57
-
58
- result = pipe("path/to/audio.wav")
59
- print(result["text"])
60
- ```
61
-
62
- ## Limitations
63
-
64
- - English only
65
- - Optimized for 16kHz audio; other sample rates are resampled automatically
66
- - Performance may degrade on heavily accented speech, noisy environments, or domain-specific jargon
67
- - Maximum audio length limited by context window
68
-
69
- ## Learn More
70
-
71
- - **[Train your own model](https://github.com/alexkroman/tiny-audio)** The full codebase with training scripts
72
- - **[Free 3-hour course](https://github.com/alexkroman/tiny-audio/blob/main/docs/course/0-course-overview.md)** Build your own ASR system from scratch
73
- - **[Submit to leaderboard](https://github.com/alexkroman/tiny-audio#leaderboard)** Share your trained model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
 
 
 
 
 
 
 
 
3
  tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: tiny-audio
7
+ results: []
 
 
8
  ---
9
 
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # tiny-audio
14
+
15
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 0.2281
18
+
19
+ ## Model description
20
+
21
+ More information needed
22
+
23
+ ## Intended uses & limitations
24
+
25
+ More information needed
26
+
27
+ ## Training and evaluation data
28
+
29
+ More information needed
30
+
31
+ ## Training procedure
32
+
33
+ ### Training hyperparameters
34
+
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 0.0003
37
+ - train_batch_size: 6
38
+ - eval_batch_size: 6
39
+ - seed: 42
40
+ - gradient_accumulation_steps: 3
41
+ - total_train_batch_size: 18
42
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
+ - lr_scheduler_type: cosine
44
+ - lr_scheduler_warmup_steps: 500
45
+ - num_epochs: 1
46
+
47
+ ### Training results
48
+
49
+ | Training Loss | Epoch | Step | Validation Loss |
50
+ |:-------------:|:------:|:-----:|:---------------:|
51
+ | 0.4431 | 0.0298 | 2000 | 0.3491 |
52
+ | 0.4701 | 0.0596 | 4000 | 0.3217 |
53
+ | 0.4086 | 0.0894 | 6000 | 0.3092 |
54
+ | 0.3937 | 0.1192 | 8000 | 0.2949 |
55
+ | 0.336 | 0.1490 | 10000 | 0.2896 |
56
+ | 0.3609 | 0.1788 | 12000 | 0.2827 |
57
+ | 0.342 | 0.3128 | 14000 | 0.2654 |
58
+ | 0.3576 | 0.3575 | 16000 | 0.2667 |
59
+ | 0.3266 | 0.4022 | 18000 | 0.2550 |
60
+ | 0.2951 | 0.3352 | 20000 | 0.2637 |
61
+ | 0.3089 | 0.3687 | 22000 | 0.2646 |
62
+ | 0.2892 | 0.4022 | 24000 | 0.2606 |
63
+ | 0.3752 | 0.4357 | 26000 | 0.2547 |
64
+ | 0.2865 | 0.4692 | 28000 | 0.2535 |
65
+ | 0.327 | 0.5027 | 30000 | 0.2494 |
66
+ | 0.3438 | 0.5363 | 32000 | 0.2453 |
67
+ | 0.2843 | 0.5698 | 34000 | 0.2405 |
68
+ | 0.3015 | 0.6033 | 36000 | 0.2374 |
69
+ | 0.2904 | 0.6368 | 38000 | 0.2364 |
70
+ | 0.2946 | 0.6703 | 40000 | 0.2340 |
71
+ | 0.3428 | 0.7038 | 42000 | 0.2323 |
72
+ | 0.3036 | 0.7374 | 44000 | 0.2299 |
73
+ | 0.3381 | 0.7709 | 46000 | 0.2293 |
74
+ | 0.2993 | 0.8044 | 48000 | 0.2291 |
75
+ | 0.302 | 0.8379 | 50000 | 0.2282 |
76
+ | 0.2779 | 0.8714 | 52000 | 0.2280 |
77
+ | 0.2856 | 0.9049 | 54000 | 0.2281 |
78
+ | 0.2904 | 0.9384 | 56000 | 0.2280 |
79
+ | 0.3048 | 0.9720 | 58000 | 0.2281 |
80
+
81
+
82
+ ### Framework versions
83
+
84
+ - Transformers 4.57.3
85
+ - Pytorch 2.8.0+cu128
86
+ - Datasets 3.6.0
87
+ - Tokenizers 0.22.1