Instructions to use vikhyatk/moondream2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vikhyatk/moondream2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="vikhyatk/moondream2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("vikhyatk/moondream2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use vikhyatk/moondream2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vikhyatk/moondream2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vikhyatk/moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/vikhyatk/moondream2
- SGLang
How to use vikhyatk/moondream2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vikhyatk/moondream2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vikhyatk/moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vikhyatk/moondream2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vikhyatk/moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use vikhyatk/moondream2 with Docker Model Runner:
docker model run hf.co/vikhyatk/moondream2
Finetuning for pointing, object detection task
This is a fantastic model! I really appreciate the pointing feature—it's something I’ve only seen in the Molmo model before. However, unlike Molmo, which outputs point coordinates as HTML tags in its text output, this model appears to have dedicated heads for generating points and bounding boxes, which is very impressive.
I’m particularly curious about how these new features can be fine-tuned. Do you plan to release a notebook demonstrating the process (for pointing and object detection)? A blog post explaining the model's architecture would also be incredibly helpful for understanding its unique capabilities.
Additionally, I noticed that the example code includes special methods like model.caption and model.query. Does this mean the model cannot be used like a traditional vision-language model? Is it possible to input a chat history for conversational use?
Thank you again for this amazing model!
I am working on a post detailing how the pointing/bounding box heads work, as well as scripts for finetuning. Will reply here when that's up.
The model currently cannot by used in a multi-turn conversational setting, we're focused on maximizing single-turn visual understanding since that's what is most useful for developers building vision applications.
very cool, looking forward to that vik and planning to do a vid
Did this ever happen?