| | --- |
| | license: cc-by-4.0 |
| | language: |
| | - en |
| | - it |
| | - pt |
| | - de |
| | - fr |
| | - es |
| | - ja |
| | - zh |
| | tags: |
| | - automatic-speech-recognition |
| | - speech |
| | - audio |
| | - Transformer |
| | - flow-matching |
| | - discrete-flow-matching |
| | - pytorch |
| | - hf-asr-leaderboard |
| | library_name: drax |
| | --- |
| | |
| | # Drax: Speech Recognition with Discrete Flow Matching |
| |
|
| | ## Model Overview |
| |
|
| | The Drax model family provides speech recognition models based on discrete flow matching. |
| | The `drax-v1` model supports eight languages: English, Spanish, French, Portuguese, German, Italian, Japanese and Chinese. |
| | It is an encoder-decoder model consists of a Whisper-large-v3 encoder, and a DiT based decoder, with a total of ~1.2B parameters. |
| |
|
| | More details on usage in our GitHub repo, [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) and our [paper](https://arxiv.org/abs/2510.04162). |
| |
|
| | ## Usage |
| |
|
| | See [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) for installation instructions. |
| |
|
| | ```python |
| | from drax import Transcriber |
| | |
| | asr = Transcriber(model_path="aiola/drax-v1") |
| | result = asr.transcribe("/path/to/audio.wav", language="en") |
| | print(result[0].transcript) |
| | ``` |
| |
|
| | Control sampling steps, temperature etc. |
| |
|
| | ```python |
| | from drax import Transcriber |
| | |
| | asr = Transcriber(model_path="aiola/drax-v1") |
| | result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2) |
| | print(result[0].transcript) |
| | ``` |
| |
|
| | Batch inference: |
| |
|
| | ```python |
| | from drax import Transcriber |
| | |
| | asr = Transcriber(model_path="aiola/drax-v1") |
| | audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"] |
| | languages = ["en", "de"] |
| | result = asr.transcribe(audio_paths, language=languages) |
| | print(result.transcript) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{navon2025drax, |
| | title={Drax: Speech Recognition with Discrete Flow Matching}, |
| | author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan}, |
| | journal={arXiv preprint arXiv:2510.04162}, |
| | year={2025} |
| | } |
| | ``` |