embedl
/

gemma-3-1b-it-FlashHead

flash_head_gemma3_text

text-generation-inference

Model card Files Files and versions

swaze commited on Dec 9, 2025

Commit

fbc85b5

·

verified ·

1 Parent(s): 7b69121

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -21,6 +21,14 @@ Designed for **low-latency inference** on **NVIDIA RTX GPUs**, leveraging:
 FlashHead matches the gemma-3-1b-it baseline within rounding error on common benchmarks (MMLU-Pro, HellaSwag, GSM8K, etc.) and, combined with quantization, delivers SOTA on-device latency.
 ---
 ## Model Details

 FlashHead matches the gemma-3-1b-it baseline within rounding error on common benchmarks (MMLU-Pro, HellaSwag, GSM8K, etc.) and, combined with quantization, delivers SOTA on-device latency.
+### Quickstart
+Launch a chat window with commands for /reset and /exit with
+```shell
+pip install embedl-models
+python3 -m embedl.models.vllm.demo --model embedl/gemma-3-1b-it-FlashHead
+```
 ---
 ## Model Details