Instructions to use NemoStation/Marlin-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NemoStation/Marlin-2B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Request GGUF of the model
Most people can't run mlx because they don't have a Apple device and gguf is universal
Fair point, GGUF does reach more people. The blocker is M-RoPE: llama.cpp doesn't fps-scale the temporal positions (the temporal M-RoPE) for video, so the model can't place events in real time and the timestamps come out wrong. Since both Marlin's find and caption outputs are timestamped, that breaks the core feature for everyone on GGUF, not just edge cases. It's a llama.cpp runtime gap (hits all Qwen3.5 VL GGUFs), and the fix has to land upstream. Until it does, grounding stays on MLX (Apple) or transformers/vLLM (any GPU/CPU), we'd rather not ship a universal GGUF that's universally wrong on the "when".