Spaces:

mazesmazes
/

tiny-audio

Runtime error

App Files Files Community

tiny-audio / README.md

HF Space Deploy

Deploy demo to HF Space

d411ac6 26 days ago

preview code

raw

history blame contribute delete

2.07 kB

	---
	title: Tiny Audio Demo
	emoji: 🎤
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: "4.44.0"
	python_version: "3.11"
	app_file: app.py
	pinned: false
	license: mit
	short_description: Efficient ASR with Whisper encoder and SmolLM3 decoder
	models:
	- mazesmazes/tiny-audio
	tags:
	- audio
	- automatic-speech-recognition
	- whisper
	- smollm
	- mlp
	suggested_hardware: cpu-upgrade
	preload_from_hub:
	- mazesmazes/tiny-audio
	---

	## Demo Overview

	This Space demonstrates an Automatic Speech Recognition (ASR) model that combines:

	- Whisper encoder for audio feature extraction
	- SmolLM3 decoder for efficient text generation

	## Features

	- 🎙️ Record from microphone or upload audio files
	- ⚡ Fast inference with a small number of trainable parameters
	- 🎯 English transcription optimized for speech-to-text
	- 📊 Lightweight model suitable for edge deployment

	## Model Architecture

	The model uses a novel architecture that bridges audio and text modalities:

	1. Audio Encoder: Frozen Whisper encoder
	2. Projection Layer: Custom audio-to-text space mapping
	3. Text Decoder: SmolLM3 (frozen)

	## Usage

	1. Upload an audio file (WAV, MP3, etc.) or record directly using your microphone
	2. Click "Transcribe" to convert speech to text
	3. The transcription will appear in the output box

	## Limitations

	- Maximum audio length: 30 seconds
	- Optimized for English language
	- Best performance with clear speech and minimal background noise

	## Links

	- 📦 [Model on Hugging Face](https://huggingface.co/mazesmazes/tiny-audio)
	- 💻 [GitHub Repository](https://github.com/alexkroman/tiny-audio)
	- 📄 [Technical Details](https://github.com/alexkroman/tiny-audio/blob/main/MODEL_CARD.md)

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@software{kroman2024tinyaudio,
	author = {Kroman, Alex},
	title = {Tiny Audio: Train your own speech recognition model in 24 hours},
	year = {2024},
	publisher = {GitHub},
	url = {https://github.com/alexkroman/tiny-audio}
	}
	```