vikhyatk commited on
Commit
d1b7c10
·
verified ·
1 Parent(s): ca9c987

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -17,4 +17,16 @@ For more details, please refer to our ||coming soon release blog post||. Or try
17
 
18
  ## Usage
19
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  * TODO: Add usage examples
 
17
 
18
  ## Usage
19
 
20
+ Load the model and prepare it for inference. We use [FlexAttention for inference](https://pytorch.org/blog/flexattention-for-inference/), so calling `.compile()` is critical for fast decoding. Our `compile` implementation also handles warmup, so you can start making requests directly once it returns.
21
+
22
+ ```
23
+ moondream = AutoModelForCausalLM.from_pretrained(
24
+ "moondream/moondream3-preview",
25
+ trust_remote_code=True,
26
+ dtype=torch.bfloat16,
27
+ device_map={"": "cuda"},
28
+ )
29
+ moondream.compile()
30
+ ```
31
+
32
  * TODO: Add usage examples