aadel4
/

omniASR-W2V-1B

@@ -21,14 +21,16 @@ This is the **pre-trained encoder backbone without a CTC head**, suitable for fe
 | HF class             | `Wav2Vec2Model` |
 | Encoder layers       | 48 |
 | Hidden size          | 1280 |
-| Attention heads      | 20 |
 | FFN intermediate     | 5120 |
 | Source framework     | fairseq2 |
 | Source card          | `omniASR_W2V_1B` |
-| Parity verification  | ⚠️ Verification failed |
-A parity check was run but reported a mismatch. Use with caution and open an issue if you observe unexpected behaviour.
 ## Usage

 | HF class             | `Wav2Vec2Model` |
 | Encoder layers       | 48 |
 | Hidden size          | 1280 |
+| Attention heads      | 16 |
 | FFN intermediate     | 5120 |
 | Source framework     | fairseq2 |
 | Source card          | `omniASR_W2V_1B` |
+| Parity verification  | ✅ Verified |
+Numerical parity against the original fairseq2 checkpoint has been confirmed: outputs match to within `atol=1e-4` on a held-out audio sample.
+Embedding statistics on the held-out audio clip: embedding shape (1, 175, 1280), max_abs_diff=0.00e+00, mean_diff=0.00e+00, std_diff=0.00e+00
 ## Usage

config.json CHANGED Viewed

@@ -67,7 +67,7 @@
   "mask_time_prob": 0.05,
   "model_type": "wav2vec2",
   "num_adapter_layers": 3,
-  "num_attention_heads": 20,
   "num_codevector_groups": 2,
   "num_codevectors_per_group": 320,
   "num_conv_pos_embedding_groups": 16,

   "mask_time_prob": 0.05,
   "model_type": "wav2vec2",
   "num_adapter_layers": 3,
+  "num_attention_heads": 16,
   "num_codevector_groups": 2,
   "num_codevectors_per_group": 320,
   "num_conv_pos_embedding_groups": 16,