jeffasante
/

cellm-models

+---
+library_name: cellm
+tags:
+- mobile
+- rust
+- memory-efficient
+- quantized
+---
+# Cellm Models Hub
+This folder contains `.cellm` model artifacts tested with the Cellm Rust CLI.
+## Models
+### Qwen2.5 0.5B Instruct (INT8)
+- **Path**: `../models/qwen2.5-0.5b-int8-v1.cellm`
+- **Size**: ~472 MB
+- **Tokenizer**: `../models/qwen2.5-0.5b-bnb4/tokenizer.json`
+- **Type**: INT8 symmetric weight-only
+### Gemma-3 1B IT (INT4, smallest)
+- **Path**: `../models/gemma-3-1b-it-int4-v1.cellm`
+- **Size**: ~478 MB
+- **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
+- **Type**: INT4 symmetric weight-only
+### Gemma-3 1B IT (Mixed INT4, recommended)
+- **Path**: `../models/gemma-3-1b-it-mixed-int4-v1.cellm`
+- **Size**: ~1.0 GB
+- **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
+- **Type**: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)
+### Gemma-3 1B IT (INT8, most stable)
+- **Path**: `../models/gemma-3-1b-it-int8-v1.cellm`
+- **Size**: ~1.2 GB
+- **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
+- **Type**: INT8 symmetric weight-only
+## Usage
+From `/Users/jeff/Desktop/cellm`, run:
+```bash
+./target/release/infer \
+  --model models/qwen2.5-0.5b-int8-v1.cellm \
+  --tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
+  --prompt "What is sycophancy?" \
+  --chat \
+  --gen 64 \
+  --temperature 0 \
+  --backend metal \
+  --kv-encoding f16
+```
+```bash
+./target/release/infer \
+  --model models/gemma-3-1b-it-mixed-int4-v1.cellm \
+  --tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
+  --prompt "What is consciousness?" \
+  --chat \
+  --chat-format plain \
+  --gen 48 \
+  --temperature 0 \
+  --backend metal \
+  --kv-encoding f16
+```
+## About Cellm
+Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.
+## License
+Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.