File size: 1,937 Bytes
cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 0318b79 cb2ba53 fa89cf9 cb2ba53 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ---
library_name: cellm
tags:
- mobile
- rust
- memory-efficient
- quantized
---
# Cellm Models Hub
This folder contains `.cellm` model artifacts tested with the Cellm Rust CLI.
## Models
### Qwen2.5 0.5B Instruct (INT8)
- **Path**: `models/qwen2.5-0.5b-int8-v1.cellm`
- **Size**: ~472 MB
- **Tokenizer**: `models/qwen2.5-0.5b-bnb4/tokenizer.json`
- **Type**: INT8 symmetric weight-only
### Gemma-3 1B IT (INT4, smallest)
- **Path**: `models/gemma-3-1b-it-int4-v1.cellm`
- **Size**: ~478 MB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: INT4 symmetric weight-only
### Gemma-3 1B IT (Mixed INT4, recommended)
- **Path**: `models/gemma-3-1b-it-mixed-int4-v1.cellm`
- **Size**: ~1.0 GB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)
### Gemma-3 1B IT (INT8, most stable)
- **Path**: `models/gemma-3-1b-it-int8-v1.cellm`
- **Size**: ~1.2 GB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: INT8 symmetric weight-only
## Usage
From `.`, run:
```bash
./target/release/infer \
--model models/qwen2.5-0.5b-int8-v1.cellm \
--tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
--prompt "What is sycophancy?" \
--chat \
--gen 64 \
--temperature 0 \
--backend metal \
--kv-encoding f16
```
```bash
./target/release/infer \
--model models/gemma-3-1b-it-mixed-int4-v1.cellm \
--tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
--prompt "What is consciousness?" \
--chat \
--chat-format plain \
--gen 48 \
--temperature 0 \
--backend metal \
--kv-encoding f16
```
## About Cellm
Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.
## License
Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.
|