jeffasante
/

cellm-models

memory-efficient

Model card Files Files and versions

Cellm Models Hub

This folder contains .cellm model artifacts tested with the Cellm Rust CLI.

Project is available here:
Cellm Project

Models

Qwen2.5 0.5B Instruct (INT8)

Path: models/qwen2.5-0.5b-int8-v1.cellm
Size: ~472 MB
Tokenizer: models/qwen2.5-0.5b-bnb4/tokenizer.json
Type: INT8 symmetric weight-only

Gemma-3 1B IT (INT4, smallest)

Path: models/gemma-3-1b-it-int4-v1.cellm
Size: ~478 MB
Tokenizer: models/hf/gemma-3-1b-it-full/tokenizer.json
Type: INT4 symmetric weight-only

Gemma-3 1B IT (Mixed INT4, recommended)

Path: models/gemma-3-1b-it-mixed-int4-v1.cellm
Size: ~1.0 GB
Tokenizer: models/hf/gemma-3-1b-it-full/tokenizer.json
Type: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)

Gemma-3 1B IT (INT8, most stable)

Path: models/gemma-3-1b-it-int8-v1.cellm
Size: ~1.2 GB
Tokenizer: models/hf/gemma-3-1b-it-full/tokenizer.json
Type: INT8 symmetric weight-only

Usage

From ., run:

./target/release/infer \
  --model models/qwen2.5-0.5b-int8-v1.cellm \
  --tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
  --prompt "What is sycophancy?" \
  --chat \
  --gen 64 \
  --temperature 0 \
  --backend metal \
  --kv-encoding f16

./target/release/infer \
  --model models/gemma-3-1b-it-mixed-int4-v1.cellm \
  --tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
  --prompt "What is consciousness?" \
  --chat \
  --chat-format plain \
  --gen 48 \
  --temperature 0 \
  --backend metal \
  --kv-encoding f16

About Cellm

Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.

License

Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support