Marlin-2B — MLX 8-bit (Apple Silicon)

Original release: NemoStation/Marlin-2B

8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.


Base model	NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding)
Format	MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB)
Runs on	Apple Silicon (M-series)
License	Apache-2.0 (inherited from base)

Use it (mlx-vlm)

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model NemoStation/Marlin-2B-MLX-8bit \
  --video clip.mp4 --fps 2 \
  --prompt "Describe the video."

Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From <start> to <end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.

Conversion recipe

python -m mlx_vlm.convert \
  --hf-path NemoStation/Marlin-2B \
  --mlx-path ./Marlin-2B-MLX-8bit \
  -q --q-bits 8

Access

Gated with the same access form as the base model — request access above. Apache-2.0.

Downloads last month: 239

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NemoStation/Marlin-2B-MLX-8bit

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

NemoStation/Marlin-2B

Quantized

(3)

this model