Instructions to use NemoStation/Marlin-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NemoStation/Marlin-2B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Collaboration Opportunity
Respected NemoStation Team,
I've been following NemoStation's work on open-source video understanding. The way youve mapped out Marlin-2B by taking an efficient image text-to-text foundational architecture like Qwen3.5-2B and extending its temporal mechanics to handle localized, frame-by-frame video understanding is incredibly impressive.
Achieving that level of dense captioning and precise temporal grounding at a 2B footprint is exactly where edge intelligence needs to go.
I'm currently building Qubik, a search-optimized 5B parameter edge intelligence engine designed for autonomous routing and real-time verification.
I see a massive opportunity for a collaboration between our architectures
By pairing Marlin's local video-to-text understanding with Qubik's capability to execute fast, multi-layered deep recursive searches, we could build a highly reflexive automation stack. This would allow an edge device to see a physical event, immediately run a lightning-fast recursive search to deeply verify the context, and execute an accurate digital action on the spot
all without relying on massive cloud compute infrastructure
Here is a quick overview for context :
Use Case: Creating localized, ultra-fast automation pipelines where physical visual triggers instantly drive complex, verified digital workflows
Type of videos / volume: High-frequency, episodic real-time video streams parsed locally on consumer-grade hardware or edge devices
Merging Marlin’s temporal text outputs directly into Qubik's inteligence, allowing your timestamped video insights to seamlessly trigger deep, multi-hop search verification loops when anomalous events occur.
Do you have a few minutes for a call this week to chat about how we might collaborate on this?
Thanks!
Soham
Xerv