jeffasante commited on
Commit
cb2ba53
·
verified ·
1 Parent(s): 1b4406f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: cellm
3
+ tags:
4
+ - mobile
5
+ - rust
6
+ - memory-efficient
7
+ - quantized
8
+ ---
9
+
10
+ # Cellm Models Hub
11
+
12
+ This folder contains `.cellm` model artifacts tested with the Cellm Rust CLI.
13
+
14
+ ## Models
15
+
16
+ ### Qwen2.5 0.5B Instruct (INT8)
17
+ - **Path**: `../models/qwen2.5-0.5b-int8-v1.cellm`
18
+ - **Size**: ~472 MB
19
+ - **Tokenizer**: `../models/qwen2.5-0.5b-bnb4/tokenizer.json`
20
+ - **Type**: INT8 symmetric weight-only
21
+
22
+ ### Gemma-3 1B IT (INT4, smallest)
23
+ - **Path**: `../models/gemma-3-1b-it-int4-v1.cellm`
24
+ - **Size**: ~478 MB
25
+ - **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
26
+ - **Type**: INT4 symmetric weight-only
27
+
28
+ ### Gemma-3 1B IT (Mixed INT4, recommended)
29
+ - **Path**: `../models/gemma-3-1b-it-mixed-int4-v1.cellm`
30
+ - **Size**: ~1.0 GB
31
+ - **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
32
+ - **Type**: Mixed precision (attention/embeddings higher precision, MLP mostly INT4)
33
+
34
+ ### Gemma-3 1B IT (INT8, most stable)
35
+ - **Path**: `../models/gemma-3-1b-it-int8-v1.cellm`
36
+ - **Size**: ~1.2 GB
37
+ - **Tokenizer**: `../models/hf/gemma-3-1b-it-full/tokenizer.json`
38
+ - **Type**: INT8 symmetric weight-only
39
+
40
+ ## Usage
41
+
42
+ From `/Users/jeff/Desktop/cellm`, run:
43
+
44
+ ```bash
45
+ ./target/release/infer \
46
+ --model models/qwen2.5-0.5b-int8-v1.cellm \
47
+ --tokenizer models/qwen2.5-0.5b-bnb4/tokenizer.json \
48
+ --prompt "What is sycophancy?" \
49
+ --chat \
50
+ --gen 64 \
51
+ --temperature 0 \
52
+ --backend metal \
53
+ --kv-encoding f16
54
+ ```
55
+
56
+ ```bash
57
+ ./target/release/infer \
58
+ --model models/gemma-3-1b-it-mixed-int4-v1.cellm \
59
+ --tokenizer models/hf/gemma-3-1b-it-full/tokenizer.json \
60
+ --prompt "What is consciousness?" \
61
+ --chat \
62
+ --chat-format plain \
63
+ --gen 48 \
64
+ --temperature 0 \
65
+ --backend metal \
66
+ --kv-encoding f16
67
+ ```
68
+
69
+ ## About Cellm
70
+ Cellm is a Rust-native inference runtime focused on mobile/desktop local LLM serving with Metal acceleration and memory-mapped model loading.
71
+
72
+ ## License
73
+ Please follow each upstream model license (Qwen and Gemma terms) when redistributing weights and tokenizers.