Automatic Speech Recognition
Transformers
Safetensors
Persian
wav2vec2
audio
speech
persian
nvfp4
compressed-tensors
quantized
8-bit precision
Instructions to use Reza2kn/facebook_mms-1b-all-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Reza2kn/facebook_mms-1b-all-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Reza2kn/facebook_mms-1b-all-NVFP4")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("Reza2kn/facebook_mms-1b-all-NVFP4") model = AutoModelForCTC.from_pretrained("Reza2kn/facebook_mms-1b-all-NVFP4") - Notebooks
- Google Colab
- Kaggle
facebook_mms-1b-all-NVFP4
NVFP4 (NVFP4, W4A4) post-training quantization of facebook/mms-1b-all β architecture: w2v2_ctc.
- Format:
nvfp4-pack-quantized(compressed-tensors). 4-bit FP4 weights, per-block FP8 (E4M3) scales, per-tensor FP32 global scales; activations dynamically quantized to FP4. - Calibration: 32 Persian clips from
Reza2kn/persian-asr-eval-v0(held out from the WER eval set). - Hardware target: NVIDIA Blackwell tensor cores (sm_100+). Quantized on RTX 5080 Laptop (sm_120).
- Quantized layers: all Linear modules in the encoder/decoder (CTC
lm_head/proj_outleft full precision).
Eval β Reza2kn/persian-asr-eval-v0 (FLEURS-fa)
| Variant | WER β | CER β | clips | per-clip latency | peak VRAM |
|---|---|---|---|---|---|
| NVFP4 (this repo) | 17.68% | 4.69% | 200 | 292 ms | 3403 MiB |
Persian text normalization for WER/CER: NFKC, ZWNJ β space, ΩβΫ / ΩβΪ©, digit folding, punctuation stripping, whitespace collapse.
Usage
import torch
import soundfile as sf
from transformers import AutoProcessor, AutoModel
repo = "Reza2kn/facebook_mms-1b-all-NVFP4"
processor = AutoProcessor.from_pretrained(repo)
# Load in bfloat16 β NVFP4 weights decompress to bf16 inside CompressedLinear.
model = AutoModel.from_pretrained(repo, dtype=torch.bfloat16).to("cuda").eval()
(See the original facebook/mms-1b-all model card for arch-specific decoding boilerplate.)
How it was made
llmcompressor QuantizationModifier(targets=["Linear"], scheme="NVFP4", ignore=...) β
compressed-tensors nvfp4-pack-quantized checkpoint.
License
Inherits the base model's license.
- Downloads last month
- 43
Model tree for Reza2kn/facebook_mms-1b-all-NVFP4
Base model
facebook/mms-1b-all