Cellm Models Hub
This folder contains .cellm model artifacts tested with the Cellm Rust CLI.
Models
Qwen2.5 0.5B Instruct (INT8)
- Path:
models/qwen2.5-0.5b-int8-v1.cellm - Size: ~472 MB
- Tokenizer:
models/qwen2.5-0.5b-bnb4/tokenizer.json - Type: INT8 symmetric weight-only
Gemma-3 1B IT (INT4, smallest)
- Path:
models/gemma-3-1b-it-int4-v1.cellm - Size: ~478 MB
- Tokenizer:
models/hf/gemma-3-1b-it-full/tokenizer.json - Type: INT4 symmetric weight-only
Gemma-3 1B IT (Mixed INT4, recommended)
- Path:
models/gemma-3-1b-it-mixed-int4-v1.cellm - Size: ~1.0 GB
- Tokenizer:
models/hf/gemma-3-1b-it-full/tokenizer.json - Type: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)
Gemma-3 1B IT (INT8, most stable)
- Path:
models/gemma-3-1b-it-int8-v1.cellm - Size: ~1.2 GB
- Tokenizer:
models/hf/gemma-3-1b-it-full/tokenizer.json - Type: INT8 symmetric weight-only
Usage
From ., run:
./target/release/infer \
--model models/qwen2.5-0.5b-int8-v1.cellm \
--tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
--prompt "What is sycophancy?" \
--chat \
--gen 64 \
--temperature 0 \
--backend metal \
--kv-encoding f16
./target/release/infer \
--model models/gemma-3-1b-it-mixed-int4-v1.cellm \
--tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
--prompt "What is consciousness?" \
--chat \
--chat-format plain \
--gen 48 \
--temperature 0 \
--backend metal \
--kv-encoding f16
About Cellm
Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.
License
Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support