File size: 2,403 Bytes
16bd0f8
 
 
 
 
 
 
 
 
 
 
 
2e4e29b
16bd0f8
 
 
 
 
 
 
 
 
 
2e4e29b
16bd0f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e4e29b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: mlx-audio-plus
base_model:
- FunAudioLLM/Fun-ASR-Nano-2512
tags:
- mlx
- funasr
- speech-recognition
- speech-to-text
- stt
pipeline_tag: automatic-speech-recognition
language:
- multilingual
---

# mlx-community/Fun-ASR-Nano-2512-4bit

This model was converted to MLX format from [FunAudioLLM/Fun-ASR-Nano-2512](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version **0.1.4**.

## Features

| Feature | Description |
|---------|-------------|
| **Multilingual** | Supports 13+ languages |
| **Translation** | Translate speech directly to English text |
| **Custom prompting** | Guide recognition with domain-specific context |
| **Streaming** | Real-time token-by-token output |

## Installation

```bash
pip install -U mlx-audio-plus
```

## Usage

### Basic Transcription

```python
from mlx_audio.stt.models.funasr import Model

# Load the model
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")

# Transcribe audio
result = model.generate("audio.wav")
print(result.text)
# Output: "The quick brown fox jumps over the lazy dog."

print(f"Duration: {result.duration:.2f}s")
print(f"Language: {result.language}")
```

### Translation (Speech to English Text)

```python
# Translate Chinese/Japanese/etc. audio to English
result = model.generate(
    "chinese_speech.wav",
    task="translate",
    target_language="en"
)
print(result.text)  # English translation
```

### Custom Prompting

Provide context to improve recognition accuracy for specialized domains:

```python
# Medical transcription
result = model.generate(
    "doctor_notes.wav",
    initial_prompt="Medical consultation discussing cardiac symptoms and treatment options."
)

# Technical content
result = model.generate(
    "tech_podcast.wav",
    initial_prompt="Discussion about machine learning, APIs, and software development."
)
```

### Streaming Output

Get real-time output as the model generates:

```python
# Print tokens as they're generated
result = model.generate("audio.wav", verbose=True)
# Tokens stream to stdout in real-time

# Or use the streaming generator
for chunk in model.generate("audio.wav", stream=True):
    print(chunk, end="", flush=True)
```

## Supported Languages

See [original model](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) for the full list of supported languages.