Spaces:
Runtime error
Runtime error
| title: Tiny Audio Demo | |
| emoji: π€ | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Efficient ASR with Whisper encoder and SmolLM3 decoder | |
| models: | |
| - mazesmazes/tiny-audio | |
| tags: | |
| - audio | |
| - automatic-speech-recognition | |
| - whisper | |
| - smollm | |
| - mlp | |
| suggested_hardware: cpu-upgrade | |
| preload_from_hub: | |
| - mazesmazes/tiny-audio | |
| ## Demo Overview | |
| This Space demonstrates an Automatic Speech Recognition (ASR) model that combines: | |
| - **Whisper encoder** for audio feature extraction | |
| - **SmolLM3 decoder** for efficient text generation | |
| ## Features | |
| - ποΈ **Record from microphone** or upload audio files | |
| - β‘ **Fast inference** with a small number of trainable parameters | |
| - π― **English transcription** optimized for speech-to-text | |
| - π **Lightweight model** suitable for edge deployment | |
| ## Model Architecture | |
| The model uses a novel architecture that bridges audio and text modalities: | |
| 1. **Audio Encoder**: Frozen Whisper encoder | |
| 2. **Projection Layer**: Custom audio-to-text space mapping | |
| 3. **Text Decoder**: SmolLM3 (frozen) | |
| ## Usage | |
| 1. **Upload an audio file** (WAV, MP3, etc.) or **record directly** using your microphone | |
| 2. Click **"Transcribe"** to convert speech to text | |
| 3. The transcription will appear in the output box | |
| ## Limitations | |
| - Maximum audio length: 30 seconds | |
| - Optimized for English language | |
| - Best performance with clear speech and minimal background noise | |
| ## Links | |
| - π¦ [Model on Hugging Face](https://huggingface.co/mazesmazes/tiny-audio) | |
| - π» [GitHub Repository](https://github.com/alexkroman/tiny-audio) | |
| - π [Technical Details](https://github.com/alexkroman/tiny-audio/blob/main/MODEL_CARD.md) | |
| ## Citation | |
| If you use this model in your research, please cite: | |
| ```bibtex | |
| @software{kroman2024tinyaudio, | |
| author = {Kroman, Alex}, | |
| title = {Tiny Audio: Train your own speech recognition model in 24 hours}, | |
| year = {2024}, | |
| publisher = {GitHub}, | |
| url = {https://github.com/alexkroman/tiny-audio} | |
| } | |
| ``` | |