swaze commited on
Commit
fbc85b5
·
verified ·
1 Parent(s): 7b69121

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -21,6 +21,14 @@ Designed for **low-latency inference** on **NVIDIA RTX GPUs**, leveraging:
21
 
22
  FlashHead matches the gemma-3-1b-it baseline within rounding error on common benchmarks (MMLU-Pro, HellaSwag, GSM8K, etc.) and, combined with quantization, delivers SOTA on-device latency.
23
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  ## Model Details
 
21
 
22
  FlashHead matches the gemma-3-1b-it baseline within rounding error on common benchmarks (MMLU-Pro, HellaSwag, GSM8K, etc.) and, combined with quantization, delivers SOTA on-device latency.
23
 
24
+ ### Quickstart
25
+
26
+ Launch a chat window with commands for /reset and /exit with
27
+
28
+ ```shell
29
+ pip install embedl-models
30
+ python3 -m embedl.models.vllm.demo --model embedl/gemma-3-1b-it-FlashHead
31
+ ```
32
  ---
33
 
34
  ## Model Details