File size: 1,937 Bytes
cb2ba53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0318b79
cb2ba53
0318b79
cb2ba53
 
 
0318b79
cb2ba53
0318b79
cb2ba53
 
 
0318b79
cb2ba53
0318b79
cb2ba53
 
 
0318b79
cb2ba53
0318b79
cb2ba53
 
 
 
fa89cf9
cb2ba53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
library_name: cellm
tags:
- mobile
- rust
- memory-efficient
- quantized
---

# Cellm Models Hub

This folder contains `.cellm` model artifacts tested with the Cellm Rust CLI.

## Models

### Qwen2.5 0.5B Instruct (INT8)
- **Path**: `models/qwen2.5-0.5b-int8-v1.cellm`
- **Size**: ~472 MB
- **Tokenizer**: `models/qwen2.5-0.5b-bnb4/tokenizer.json`
- **Type**: INT8 symmetric weight-only

### Gemma-3 1B IT (INT4, smallest)
- **Path**: `models/gemma-3-1b-it-int4-v1.cellm`
- **Size**: ~478 MB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: INT4 symmetric weight-only

### Gemma-3 1B IT (Mixed INT4, recommended)
- **Path**: `models/gemma-3-1b-it-mixed-int4-v1.cellm`
- **Size**: ~1.0 GB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)

### Gemma-3 1B IT (INT8, most stable)
- **Path**: `models/gemma-3-1b-it-int8-v1.cellm`
- **Size**: ~1.2 GB
- **Tokenizer**: `models/hf/gemma-3-1b-it-full/tokenizer.json`
- **Type**: INT8 symmetric weight-only

## Usage

From `.`, run:

```bash
./target/release/infer \
  --model models/qwen2.5-0.5b-int8-v1.cellm \
  --tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
  --prompt "What is sycophancy?" \
  --chat \
  --gen 64 \
  --temperature 0 \
  --backend metal \
  --kv-encoding f16
```

```bash
./target/release/infer \
  --model models/gemma-3-1b-it-mixed-int4-v1.cellm \
  --tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
  --prompt "What is consciousness?" \
  --chat \
  --chat-format plain \
  --gen 48 \
  --temperature 0 \
  --backend metal \
  --kv-encoding f16
```

## About Cellm
Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.

## License
Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.