Marlin-2B — MLX 8-bit (Apple Silicon)

Original release: NemoStation/Marlin-2B

8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.

Base model NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding)
Format MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB)
Runs on Apple Silicon (M-series)
License Apache-2.0 (inherited from base)

Use it (mlx-vlm)

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model NemoStation/Marlin-2B-MLX-8bit \
  --video clip.mp4 --fps 2 \
  --prompt "Describe the video."

Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From <start> to <end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.

Conversion recipe

python -m mlx_vlm.convert \
  --hf-path NemoStation/Marlin-2B \
  --mlx-path ./Marlin-2B-MLX-8bit \
  -q --q-bits 8

Access

Gated with the same access form as the base model — request access above. Apache-2.0.

Downloads last month
239
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NemoStation/Marlin-2B-MLX-8bit

Finetuned
Qwen/Qwen3.5-2B
Quantized
(3)
this model