mlx-community
/

Fun-ASR-Nano-2512-4bit

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

Fun-ASR-Nano-2512-4bit / README.md

depasquale's picture

Upload folder using huggingface_hub

2e4e29b verified about 1 month ago

|

history blame contribute delete

2.4 kB

	---
	library_name: mlx-audio-plus
	base_model:
	- FunAudioLLM/Fun-ASR-Nano-2512
	tags:
	- mlx
	- funasr
	- speech-recognition
	- speech-to-text
	- stt
	pipeline_tag: automatic-speech-recognition
	language:
	- multilingual
	---

	# mlx-community/Fun-ASR-Nano-2512-4bit

	This model was converted to MLX format from [FunAudioLLM/Fun-ASR-Nano-2512](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version 0.1.4.

	## Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| Multilingual \| Supports 13+ languages \|
	\| Translation \| Translate speech directly to English text \|
	\| Custom prompting \| Guide recognition with domain-specific context \|
	\| Streaming \| Real-time token-by-token output \|

	## Installation

	```bash
	pip install -U mlx-audio-plus
	```

	## Usage

	### Basic Transcription

	```python
	from mlx_audio.stt.models.funasr import Model

	# Load the model
	model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")

	# Transcribe audio
	result = model.generate("audio.wav")
	print(result.text)
	# Output: "The quick brown fox jumps over the lazy dog."

	print(f"Duration: {result.duration:.2f}s")
	print(f"Language: {result.language}")
	```

	### Translation (Speech to English Text)

	```python
	# Translate Chinese/Japanese/etc. audio to English
	result = model.generate(
	"chinese_speech.wav",
	task="translate",
	target_language="en"
	)
	print(result.text) # English translation
	```

	### Custom Prompting

	Provide context to improve recognition accuracy for specialized domains:

	```python
	# Medical transcription
	result = model.generate(
	"doctor_notes.wav",
	initial_prompt="Medical consultation discussing cardiac symptoms and treatment options."
	)

	# Technical content
	result = model.generate(
	"tech_podcast.wav",
	initial_prompt="Discussion about machine learning, APIs, and software development."
	)
	```

	### Streaming Output

	Get real-time output as the model generates:

	```python
	# Print tokens as they're generated
	result = model.generate("audio.wav", verbose=True)
	# Tokens stream to stdout in real-time

	# Or use the streaming generator
	for chunk in model.generate("audio.wav", stream=True):
	print(chunk, end="", flush=True)
	```

	## Supported Languages

	See [original model](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) for the full list of supported languages.