Musci-research
/

Musci-ASR-2.4B

@@ -3,19 +3,39 @@ language: en
 library_name: transformers
 pipeline_tag: automatic-speech-recognition
 tags:
   - asr
   - speech
   - english
 license: apache-2.0
 ---
 # Musci-ASR-2.4B
-An English speech-to-text model that pairs a Qwen3 language-model backbone with a
-Qwen3-Omni-MoE audio encoder. Trained on public English ASR corpora and tuned with
-reinforcement learning on the Open ASR Leaderboard training splits. Total \~2.4B parameters,
-distributed as a single `bfloat16` safetensors shard (\~4.84 GB).
 ## Inference
@@ -35,8 +55,14 @@ model = AutoModelForCausalLM.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
 MusciProcessor = get_class_from_dynamic_module("processing_Musci.MusciProcessor", REPO)
-MelConfig      = get_class_from_dynamic_module("processing_Musci.MelConfig", REPO)
-mel_cfg = MelConfig(mel_sr=16000, mel_dim=128, mel_n_fft=400, mel_hop_length=160)
 processor = MusciProcessor(tokenizer, config=mel_cfg, enable_time_marker=False)
 processor.load_template(hf_hub_download(REPO, "chat_template_default.py"))
@@ -59,11 +85,33 @@ transcript = processor.batch_decode(new_ids, skip_special_tokens=True)[0].strip(
 print(transcript)
 ```
-## Audio frontend
-- Sample rate: **16 kHz**
-- Features: Whisper log-mel filterbank — `n_mels=128`, `n_fft=400`, `hop_length=160`
 ## License
-apache-2.0.

 library_name: transformers
 pipeline_tag: automatic-speech-recognition
 tags:
+  - automatic-speech-recognition
+  - speech-to-text
   - asr
   - speech
   - english
+  - qwen3
+  - audio
+  - reinforcement-learning
 license: apache-2.0
 ---
 # Musci-ASR-2.4B
+Musci-ASR-2.4B is an English speech-to-text model that pairs a Qwen3-1.7B-base language-model backbone with a Qwen3-Omni-MoE audio encoder. A gated-MLP adapter projects audio features into the language-model embedding space. The model is trained on public English ASR corpora and fine-tuned with reinforcement learning on the Open ASR Leaderboard training splits.
+The model has approximately 2.4B parameters and is distributed as a single `bfloat16` safetensors shard of approximately 4.84 GB.
+## Model Details
+- **Developed by:** Musci Research
+- **Model type:** Automatic Speech Recognition / speech-to-text model
+- **Language:** English
+- **License:** Apache-2.0
+- **Library:** Transformers
+- **Backbone:** Qwen3-1.7B-base, 28 layers, hidden size 2048
+- **Audio encoder:** Qwen3-Omni-MoE audio encoder
+- **Adapter:** Gated-MLP adapter, hidden size 8192
+- **Parameter size:** approximately 2.4B
+- **Checkpoint format:** `bfloat16` safetensors
+## Intended Use
+This model is intended for English automatic speech recognition, including transcription of English speech audio for research and evaluation purposes.
 ## Inference
 tokenizer = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
 MusciProcessor = get_class_from_dynamic_module("processing_Musci.MusciProcessor", REPO)
+MelConfig = get_class_from_dynamic_module("processing_Musci.MelConfig", REPO)
+mel_cfg = MelConfig(
+    mel_sr=16000,
+    mel_dim=128,
+    mel_n_fft=400,
+    mel_hop_length=160,
+)
 processor = MusciProcessor(tokenizer, config=mel_cfg, enable_time_marker=False)
 processor.load_template(hf_hub_download(REPO, "chat_template_default.py"))
 print(transcript)
 ```
+## Audio Frontend
+- **Sample rate:** 16 kHz
+- **Features:** Whisper log-mel filterbank
+- **Mel bins:** 128
+- **FFT size:** 400
+- **Hop length:** 160
+## Training
+The model was trained on public English ASR corpora and fine-tuned with reinforcement learning on the Open ASR Leaderboard training splits.
+## Limitations
+The model is designed for English ASR. It may perform worse on non-English speech, heavy accents, noisy recordings, overlapping speakers, far-field audio, domain-specific terminology, or audio conditions that differ significantly from the training and evaluation data. The output should be manually reviewed before use in high-stakes settings.
+## Citation
+```bibtex
+@misc{musci_asr_2025,
+  title        = {{Musci-ASR-2.4B}},
+  author       = {{Musci Research}},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/Musci-research/Musci-ASR-2.4B}}
+}
+```
 ## License
+This model is released under the Apache-2.0 license.