| # moonshine-tiny-optimized |
|
|
| Optimized version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) for faster GPU inference. |
|
|
| ## Optimizations applied |
|
|
| - **FP16 weights** — halves model memory footprint |
| - **SDPA (Scaled Dot-Product Attention)** — uses optimized fused attention kernels |
| - **Static KV cache support** — pre-allocates cache during generation for up to 1.19x speedup |
| - **Updated config** — `attn_implementation="sdpa"` set as default |
|
|
| ## Benchmarks (T4 GPU, 5s audio) |
|
|
| | Variant | Median Time | Speedup vs Baseline | Peak Memory | |
| |---------|------------|---------------------|-------------| |
| | Baseline FP32 | 0.028s | 1.0x | 123.0 MB | |
| | FP16 + SDPA | 0.028s | 0.98x | 118.6 MB | |
| | **FP16 + SDPA + Static KV** | **0.024s** | **1.19x** | **72.1 MB** | |
| | torch.compile | 0.030s | 0.94x | 126.3 MB | |
| | 8-bit quantization | 0.108s | 0.26x | 124.0 MB | |
|
|
| **Best config**: `FP16 + SDPA + static KV cache` gives **1.19x speedup** and **41% memory reduction**. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoProcessor, MoonshineForConditionalGeneration |
| import torch |
| |
| processor = AutoProcessor.from_pretrained("felixem/moonshine-tiny-optimized") |
| model = MoonshineForConditionalGeneration.from_pretrained( |
| "felixem/moonshine-tiny-optimized", |
| torch_dtype=torch.float16, |
| device_map="auto", |
| ) |
| |
| # For maximum speed, use static KV cache: |
| inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt") |
| inputs = inputs.to(model.device, dtype=model.dtype) |
| |
| generated_ids = model.generate( |
| **inputs, |
| cache_implementation="static", |
| max_new_tokens=50, |
| ) |
| transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| ``` |
|
|
| ## Model Info |
|
|
| - **Architecture**: Moonshine (encoder-decoder transformer, RoPE) |
| - **Parameters**: 27.1M |
| - **License**: MIT |
| - **Original model**: https://huggingface.co/UsefulSensors/moonshine-tiny |
|
|