jeffasante
/

cellm-models

memory-efficient

Model card Files Files and versions

cellm-models / README.md

jeffasante's picture

Upload README.md with huggingface_hub

fa89cf9 verified 1 day ago

|

history blame contribute delete

1.94 kB

	---
	library_name: cellm
	tags:
	- mobile
	- rust
	- memory-efficient
	- quantized
	---

	# Cellm Models Hub

	This folder contains `.cellm` model artifacts tested with the Cellm Rust CLI.

	## Models

	### Qwen2.5 0.5B Instruct (INT8)
	- Path: `models/qwen2.5-0.5b-int8-v1.cellm`
	- Size: ~472 MB
	- Tokenizer: `models/qwen2.5-0.5b-bnb4/tokenizer.json`
	- Type: INT8 symmetric weight-only

	### Gemma-3 1B IT (INT4, smallest)
	- Path: `models/gemma-3-1b-it-int4-v1.cellm`
	- Size: ~478 MB
	- Tokenizer: `models/hf/gemma-3-1b-it-full/tokenizer.json`
	- Type: INT4 symmetric weight-only

	### Gemma-3 1B IT (Mixed INT4, recommended)
	- Path: `models/gemma-3-1b-it-mixed-int4-v1.cellm`
	- Size: ~1.0 GB
	- Tokenizer: `models/hf/gemma-3-1b-it-full/tokenizer.json`
	- Type: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)

	### Gemma-3 1B IT (INT8, most stable)
	- Path: `models/gemma-3-1b-it-int8-v1.cellm`
	- Size: ~1.2 GB
	- Tokenizer: `models/hf/gemma-3-1b-it-full/tokenizer.json`
	- Type: INT8 symmetric weight-only

	## Usage

	From `.`, run:

	```bash
	./target/release/infer \
	--model models/qwen2.5-0.5b-int8-v1.cellm \
	--tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
	--prompt "What is sycophancy?" \
	--chat \
	--gen 64 \
	--temperature 0 \
	--backend metal \
	--kv-encoding f16
	```

	```bash
	./target/release/infer \
	--model models/gemma-3-1b-it-mixed-int4-v1.cellm \
	--tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
	--prompt "What is consciousness?" \
	--chat \
	--chat-format plain \
	--gen 48 \
	--temperature 0 \
	--backend metal \
	--kv-encoding f16
	```

	## About Cellm
	Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.

	## License
	Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.