Video-Text-to-Text
MLX
Safetensors
English
qwen3_5
video
multimodal
video-captioning
temporal-grounding
quantized
8-bit precision
custom_code
8-bit precision
Instructions to use NemoStation/Marlin-2B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use NemoStation/Marlin-2B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Marlin-2B-MLX-8bit NemoStation/Marlin-2B-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Marlin-2B — MLX 8-bit (Apple Silicon)
Original release: NemoStation/Marlin-2B
8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.
| Base model | NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding) |
| Format | MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB) |
| Runs on | Apple Silicon (M-series) |
| License | Apache-2.0 (inherited from base) |
Use it (mlx-vlm)
pip install mlx-vlm
python -m mlx_vlm.generate \
--model NemoStation/Marlin-2B-MLX-8bit \
--video clip.mp4 --fps 2 \
--prompt "Describe the video."
Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From
<start>to<end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.
Conversion recipe
python -m mlx_vlm.convert \
--hf-path NemoStation/Marlin-2B \
--mlx-path ./Marlin-2B-MLX-8bit \
-q --q-bits 8
Access
Gated with the same access form as the base model — request access above. Apache-2.0.
- Downloads last month
- 239
Model size
0.9B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support