PierrunoYT
/

moondream3-preview

Image-Text-to-Text

text-generation

Model card Files Files and versions

vikhyatk commited on Sep 17, 2025

Commit

05d8a39

·

verified ·

1 Parent(s): 638927a

Update README.md

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -32,8 +32,36 @@ moondream = AutoModelForCausalLM.from_pretrained(
 moondream.compile()
 ```
 * TODO: Add usage examples
-  * Query
-  * Caption
   * Detect
   * Point

 moondream.compile()
 ```
+The model comes with four skills, tailored towards different visual understanding tasks.
+### Query
+The `query` skill can be used to ask open-ended questions about images.
+||TK -- code example for simple VQA||
+By default, `query` runs in reasoning mode, allowing the model to "think" about the question before generating an answer. This is helpful for more complicated tasks, but sometimes the task you're running is simple and doesn't benefit from reasoning. To save on inference cost when this is the case, you can disable reasoning:
+||TK -- example without reasoning||
+If you want to stream outputs, pass in `stream=True`. You can control the temperature, top-p, and maximum number of tokens generated by passing in optional settings.
+||TK -- stream + settings example||
+Note that this isn't just for images; Moondream is also a strong general-purpose text model.
+||TK -- text only example||
+### Caption
+Whether you want short, normal-sized or long descriptions of images, the `caption` skill has you covered.
+||TK -- captioning example||
+It accepts the same streaming and temperature etc. settings as the `query` skill.
+---
 * TODO: Add usage examples
   * Detect
   * Point