|
|
--- |
|
|
library_name: mlx-audio-plus |
|
|
base_model: |
|
|
- FunAudioLLM/Fun-ASR-Nano-2512 |
|
|
tags: |
|
|
- mlx |
|
|
- funasr |
|
|
- speech-recognition |
|
|
- speech-to-text |
|
|
- stt |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
language: |
|
|
- multilingual |
|
|
--- |
|
|
|
|
|
# mlx-community/Fun-ASR-Nano-2512-4bit |
|
|
|
|
|
This model was converted to MLX format from [FunAudioLLM/Fun-ASR-Nano-2512](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version **0.1.4**. |
|
|
|
|
|
## Features |
|
|
|
|
|
| Feature | Description | |
|
|
|---------|-------------| |
|
|
| **Multilingual** | Supports 13+ languages | |
|
|
| **Translation** | Translate speech directly to English text | |
|
|
| **Custom prompting** | Guide recognition with domain-specific context | |
|
|
| **Streaming** | Real-time token-by-token output | |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install -U mlx-audio-plus |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Transcription |
|
|
|
|
|
```python |
|
|
from mlx_audio.stt.models.funasr import Model |
|
|
|
|
|
# Load the model |
|
|
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit") |
|
|
|
|
|
# Transcribe audio |
|
|
result = model.generate("audio.wav") |
|
|
print(result.text) |
|
|
# Output: "The quick brown fox jumps over the lazy dog." |
|
|
|
|
|
print(f"Duration: {result.duration:.2f}s") |
|
|
print(f"Language: {result.language}") |
|
|
``` |
|
|
|
|
|
### Translation (Speech to English Text) |
|
|
|
|
|
```python |
|
|
# Translate Chinese/Japanese/etc. audio to English |
|
|
result = model.generate( |
|
|
"chinese_speech.wav", |
|
|
task="translate", |
|
|
target_language="en" |
|
|
) |
|
|
print(result.text) # English translation |
|
|
``` |
|
|
|
|
|
### Custom Prompting |
|
|
|
|
|
Provide context to improve recognition accuracy for specialized domains: |
|
|
|
|
|
```python |
|
|
# Medical transcription |
|
|
result = model.generate( |
|
|
"doctor_notes.wav", |
|
|
initial_prompt="Medical consultation discussing cardiac symptoms and treatment options." |
|
|
) |
|
|
|
|
|
# Technical content |
|
|
result = model.generate( |
|
|
"tech_podcast.wav", |
|
|
initial_prompt="Discussion about machine learning, APIs, and software development." |
|
|
) |
|
|
``` |
|
|
|
|
|
### Streaming Output |
|
|
|
|
|
Get real-time output as the model generates: |
|
|
|
|
|
```python |
|
|
# Print tokens as they're generated |
|
|
result = model.generate("audio.wav", verbose=True) |
|
|
# Tokens stream to stdout in real-time |
|
|
|
|
|
# Or use the streaming generator |
|
|
for chunk in model.generate("audio.wav", stream=True): |
|
|
print(chunk, end="", flush=True) |
|
|
``` |
|
|
|
|
|
## Supported Languages |
|
|
|
|
|
See [original model](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) for the full list of supported languages. |
|
|
|