vikhyatk commited on
Commit
34f9fb7
·
verified ·
1 Parent(s): 92c33da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -8
README.md CHANGED
@@ -38,34 +38,151 @@ The model comes with four skills, tailored towards different visual understandin
38
 
39
  The `query` skill can be used to ask open-ended questions about images.
40
 
41
- ||TK -- code example for simple VQA||
 
 
 
 
 
 
 
42
 
43
  By default, `query` runs in reasoning mode, allowing the model to "think" about the question before generating an answer. This is helpful for more complicated tasks, but sometimes the task you're running is simple and doesn't benefit from reasoning. To save on inference cost when this is the case, you can disable reasoning:
44
 
45
- ||TK -- example without reasoning||
 
 
 
 
 
 
 
 
46
 
47
  If you want to stream outputs, pass in `stream=True`. You can control the temperature, top-p, and maximum number of tokens generated by passing in optional settings.
48
 
49
- ||TK -- stream + settings example||
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  Note that this isn't just for images; Moondream is also a strong general-purpose text model.
52
 
53
- ||TK -- text only example||
 
 
 
 
 
 
54
 
55
  ### Caption
56
 
57
  Whether you want short, normal-sized or long descriptions of images, the `caption` skill has you covered.
58
 
59
- ||TK -- captioning example||
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  It accepts the same streaming and temperature etc. settings as the `query` skill.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ### Point
64
 
65
- TK
 
 
 
 
 
 
 
 
 
 
66
 
67
  ### Detect
68
 
69
- TK
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
- ### Caching image encodings (advanced)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  The `query` skill can be used to ask open-ended questions about images.
40
 
41
+ ```python
42
+ from PIL import Image
43
+
44
+ # Simple VQA
45
+ image = Image.open("photo.jpg")
46
+ result = moondream.query(image=image, question="What's in this image?")
47
+ print(result["answer"])
48
+ ```
49
 
50
  By default, `query` runs in reasoning mode, allowing the model to "think" about the question before generating an answer. This is helpful for more complicated tasks, but sometimes the task you're running is simple and doesn't benefit from reasoning. To save on inference cost when this is the case, you can disable reasoning:
51
 
52
+ ```python
53
+ # Without reasoning for simple questions
54
+ result = moondream.query(
55
+ image=image,
56
+ question="What color is the sky?",
57
+ reasoning=False
58
+ )
59
+ print(result["answer"])
60
+ ```
61
 
62
  If you want to stream outputs, pass in `stream=True`. You can control the temperature, top-p, and maximum number of tokens generated by passing in optional settings.
63
 
64
+ ```python
65
+ # Streaming with custom settings
66
+ settings = {
67
+ "temperature": 0.7,
68
+ "top_p": 0.95,
69
+ "max_tokens": 512
70
+ }
71
+
72
+ result = moondream.query(
73
+ image=image,
74
+ question="Describe what's happening in detail",
75
+ stream=True,
76
+ settings=settings
77
+ )
78
+
79
+ # Stream the answer
80
+ for chunk in result["answer"]:
81
+ print(chunk, end="", flush=True)
82
+ ```
83
 
84
  Note that this isn't just for images; Moondream is also a strong general-purpose text model.
85
 
86
+ ```python
87
+ # Text-only example (no image)
88
+ result = moondream.query(
89
+ question="Explain the concept of machine learning in simple terms"
90
+ )
91
+ print(result["answer"])
92
+ ```
93
 
94
  ### Caption
95
 
96
  Whether you want short, normal-sized or long descriptions of images, the `caption` skill has you covered.
97
 
98
+ ```python
99
+ # Different caption lengths
100
+ image = Image.open("landscape.jpg")
101
+
102
+ # Short caption
103
+ short = moondream.caption(image, length="short")
104
+ print(f"Short: {short['caption']}")
105
+
106
+ # Normal caption (default)
107
+ normal = moondream.caption(image, length="normal")
108
+ print(f"Normal: {normal['caption']}")
109
+
110
+ # Long caption
111
+ long = moondream.caption(image, length="long")
112
+ print(f"Long: {long['caption']}")
113
+ ```
114
 
115
  It accepts the same streaming and temperature etc. settings as the `query` skill.
116
 
117
+ ```python
118
+ # Streaming caption with custom settings
119
+ result = moondream.caption(
120
+ image,
121
+ length="long",
122
+ stream=True,
123
+ settings={"temperature": 0.3}
124
+ )
125
+
126
+ for chunk in result["caption"]:
127
+ print(chunk, end="", flush=True)
128
+ ```
129
+
130
  ### Point
131
 
132
+ The `point` skill identifies specific points (x, y coordinates) for objects in an image.
133
+
134
+ ```python
135
+ # Find points for specific objects
136
+ image = Image.open("crowd.jpg")
137
+ result = moondream.point(image, "person wearing a red shirt")
138
+
139
+ # Points are normalized coordinates (0-1)
140
+ for i, point in enumerate(result["points"]):
141
+ print(f"Point {i+1}: x={point['x']:.3f}, y={point['y']:.3f}")
142
+ ```
143
 
144
  ### Detect
145
 
146
+ The `detect` skill provides bounding boxes for objects in an image.
147
+
148
+ ```python
149
+ # Detect objects with bounding boxes
150
+ image = Image.open("street_scene.jpg")
151
+ result = moondream.detect(image, "car")
152
+
153
+ # Bounding boxes are normalized coordinates (0-1)
154
+ for i, obj in enumerate(result["objects"]):
155
+ print(f"Object {i+1}: "
156
+ f"x_min={obj['x_min']:.3f}, y_min={obj['y_min']:.3f}, "
157
+ f"x_max={obj['x_max']:.3f}, y_max={obj['y_max']:.3f}")
158
+
159
+ # Control maximum number of objects
160
+ settings = {"max_objects": 10}
161
+ result = moondream.detect(image, "person", settings=settings)
162
+ ```
163
+
164
+ ### Caching image encodings (advanced)
165
 
166
+ If you're planning to run multiple inferences on the same image, you can pre-encode it once and reuse the encoding for better performance.
167
+
168
+ ```python
169
+ # Encode image once
170
+ image = Image.open("complex_scene.jpg")
171
+ encoded = moondream.encode_image(image)
172
+
173
+ # Reuse the encoding for multiple queries
174
+ questions = [
175
+ "How many people are in this image?",
176
+ "What time of day was this taken?",
177
+ "What's the weather like?"
178
+ ]
179
+
180
+ for q in questions:
181
+ result = moondream.query(image=encoded, question=q, reasoning=False)
182
+ print(f"Q: {q}")
183
+ print(f"A: {result['answer']}\n")
184
+
185
+ # Also works with other skills
186
+ caption = moondream.caption(encoded, length="normal")
187
+ objects = moondream.detect(encoded, "vehicle")
188
+ ```