Musci-research commited on
Commit
396767e
·
verified ·
1 Parent(s): 0ee23d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -10
README.md CHANGED
@@ -3,19 +3,39 @@ language: en
3
  library_name: transformers
4
  pipeline_tag: automatic-speech-recognition
5
  tags:
 
 
6
  - asr
7
  - speech
8
  - english
 
 
 
9
  license: apache-2.0
10
  ---
11
 
12
  # Musci-ASR-2.4B
13
 
14
- An English speech-to-text model that pairs a Qwen3 language-model backbone with a
15
- Qwen3-Omni-MoE audio encoder. Trained on public English ASR corpora and tuned with
16
- reinforcement learning on the Open ASR Leaderboard training splits. Total \~2.4B parameters,
17
- distributed as a single `bfloat16` safetensors shard (\~4.84 GB).
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Inference
21
 
@@ -35,8 +55,14 @@ model = AutoModelForCausalLM.from_pretrained(
35
  tokenizer = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
36
 
37
  MusciProcessor = get_class_from_dynamic_module("processing_Musci.MusciProcessor", REPO)
38
- MelConfig = get_class_from_dynamic_module("processing_Musci.MelConfig", REPO)
39
- mel_cfg = MelConfig(mel_sr=16000, mel_dim=128, mel_n_fft=400, mel_hop_length=160)
 
 
 
 
 
 
40
  processor = MusciProcessor(tokenizer, config=mel_cfg, enable_time_marker=False)
41
  processor.load_template(hf_hub_download(REPO, "chat_template_default.py"))
42
 
@@ -59,11 +85,33 @@ transcript = processor.batch_decode(new_ids, skip_special_tokens=True)[0].strip(
59
  print(transcript)
60
  ```
61
 
62
- ## Audio frontend
 
 
 
 
 
 
63
 
64
- - Sample rate: **16 kHz**
65
- - Features: Whisper log-mel filterbank — `n_mels=128`, `n_fft=400`, `hop_length=160`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ## License
68
 
69
- apache-2.0.
 
3
  library_name: transformers
4
  pipeline_tag: automatic-speech-recognition
5
  tags:
6
+ - automatic-speech-recognition
7
+ - speech-to-text
8
  - asr
9
  - speech
10
  - english
11
+ - qwen3
12
+ - audio
13
+ - reinforcement-learning
14
  license: apache-2.0
15
  ---
16
 
17
  # Musci-ASR-2.4B
18
 
19
+ Musci-ASR-2.4B is an English speech-to-text model that pairs a Qwen3-1.7B-base language-model backbone with a Qwen3-Omni-MoE audio encoder. A gated-MLP adapter projects audio features into the language-model embedding space. The model is trained on public English ASR corpora and fine-tuned with reinforcement learning on the Open ASR Leaderboard training splits.
 
 
 
20
 
21
+ The model has approximately 2.4B parameters and is distributed as a single `bfloat16` safetensors shard of approximately 4.84 GB.
22
+
23
+ ## Model Details
24
+
25
+ - **Developed by:** Musci Research
26
+ - **Model type:** Automatic Speech Recognition / speech-to-text model
27
+ - **Language:** English
28
+ - **License:** Apache-2.0
29
+ - **Library:** Transformers
30
+ - **Backbone:** Qwen3-1.7B-base, 28 layers, hidden size 2048
31
+ - **Audio encoder:** Qwen3-Omni-MoE audio encoder
32
+ - **Adapter:** Gated-MLP adapter, hidden size 8192
33
+ - **Parameter size:** approximately 2.4B
34
+ - **Checkpoint format:** `bfloat16` safetensors
35
+
36
+ ## Intended Use
37
+
38
+ This model is intended for English automatic speech recognition, including transcription of English speech audio for research and evaluation purposes.
39
 
40
  ## Inference
41
 
 
55
  tokenizer = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
56
 
57
  MusciProcessor = get_class_from_dynamic_module("processing_Musci.MusciProcessor", REPO)
58
+ MelConfig = get_class_from_dynamic_module("processing_Musci.MelConfig", REPO)
59
+
60
+ mel_cfg = MelConfig(
61
+ mel_sr=16000,
62
+ mel_dim=128,
63
+ mel_n_fft=400,
64
+ mel_hop_length=160,
65
+ )
66
  processor = MusciProcessor(tokenizer, config=mel_cfg, enable_time_marker=False)
67
  processor.load_template(hf_hub_download(REPO, "chat_template_default.py"))
68
 
 
85
  print(transcript)
86
  ```
87
 
88
+ ## Audio Frontend
89
+
90
+ - **Sample rate:** 16 kHz
91
+ - **Features:** Whisper log-mel filterbank
92
+ - **Mel bins:** 128
93
+ - **FFT size:** 400
94
+ - **Hop length:** 160
95
 
96
+ ## Training
97
+
98
+ The model was trained on public English ASR corpora and fine-tuned with reinforcement learning on the Open ASR Leaderboard training splits.
99
+
100
+ ## Limitations
101
+
102
+ The model is designed for English ASR. It may perform worse on non-English speech, heavy accents, noisy recordings, overlapping speakers, far-field audio, domain-specific terminology, or audio conditions that differ significantly from the training and evaluation data. The output should be manually reviewed before use in high-stakes settings.
103
+
104
+ ## Citation
105
+
106
+ ```bibtex
107
+ @misc{musci_asr_2025,
108
+ title = {{Musci-ASR-2.4B}},
109
+ author = {{Musci Research}},
110
+ year = {2025},
111
+ howpublished = {\url{https://huggingface.co/Musci-research/Musci-ASR-2.4B}}
112
+ }
113
+ ```
114
 
115
  ## License
116
 
117
+ This model is released under the Apache-2.0 license.