antonio commited on
Commit
feab7f4
ยท
0 Parent(s):

๐Ÿš€ Official release: Gemma3 Smart Q4 - Bilingual Offline Assistant for Raspberry Pi

Browse files

- Bilingual IT/EN support
- Optimized for Raspberry Pi 4/5
- Fully offline inference
- Benchmark: 3.56-4.2 tokens/s
- Two quantizations: Q4_K_M (quality) and Q4_0 (speed)

Files changed (4) hide show
  1. .gitattributes +3 -0
  2. README.md +181 -0
  3. gemma3-1b-q4_0.gguf +3 -0
  4. gemma3-1b-q4_k_m.gguf +3 -0
.gitattributes ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ *.gguf filter=lfs diff=lfs merge=lfs -text
2
+ *.bin filter=lfs diff=lfs merge=lfs -text
3
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ - it
6
+ tags:
7
+ - gemma
8
+ - gemma3
9
+ - quantized
10
+ - gguf
11
+ - raspberry-pi
12
+ - edge-ai
13
+ - bilingual
14
+ - ollama
15
+ - offline
16
+ model_type: text-generation
17
+ inference: false
18
+ ---
19
+
20
+ # ๐Ÿง  Gemma3 Smart Q4 โ€” Bilingual Offline Assistant for Raspberry Pi
21
+
22
+ **Gemma3 Smart Q4** is a quantized bilingual (Italianโ€“English) variant of Google's Gemma 3 1B model, optimized for edge devices like the **Raspberry Pi 4 & 5**. It runs **completely offline** with Ollama or llama.cpp, ensuring **privacy and speed** without external dependencies.
23
+
24
+ ---
25
+
26
+ ## ๐Ÿ’ป Optimized for Raspberry Pi
27
+
28
+ > โœ… **Tested on Raspberry Pi 4 (4GB)** โ€” average speed 3.56-3.67 tokens/s
29
+ > โœ… **Fully offline** โ€” no external APIs, no internet required
30
+ > โœ… **Lightweight** โ€” under 800 MB in Q4 quantization
31
+ > โœ… **Bilingual** โ€” seamlessly switches between Italian and English
32
+
33
+ ---
34
+
35
+ ## ๐Ÿ” Key Features
36
+
37
+ - ๐Ÿ—ฃ๏ธ **Bilingual AI** โ€” Automatically detects and responds in Italian or English
38
+ - โšก **Edge-optimized** โ€” Fine-tuned parameters for low-power ARM devices
39
+ - ๐Ÿ”’ **Privacy-first** โ€” All inference happens locally on your device
40
+ - ๐Ÿงฉ **Two quantizations available**:
41
+ - **Q4_K_M** (โ‰ˆ769 MB) โ†’ Better quality, more coherent reasoning
42
+ - **Q4_0** (โ‰ˆ687 MB) โ†’ 15-20% faster, ideal for real-time interactions
43
+
44
+ ---
45
+
46
+ ## ๐Ÿ“Š Benchmark Results
47
+
48
+ Tested on **Raspberry Pi 4 (4GB RAM)** with Ollama:
49
+
50
+ | Model | Avg Speed | Individual Results | File Size | Use Case |
51
+ |-------|-----------|-------------------|-----------|----------|
52
+ | **gemma3-1b-q4_k_m.gguf** | **3.56 tokens/s** | 3.71, 3.58, 3.40 t/s | 769 MB | Better quality, long conversations |
53
+ | **gemma3-1b-q4_0.gguf** | **3.67 tokens/s** | 3.65, 3.67, 3.70 t/s | 687 MB | **Default choice**, general use |
54
+
55
+ **Test details**:
56
+ - Hardware: Raspberry Pi 4 (4GB RAM)
57
+ - OS: Raspberry Pi OS (Debian Bookworm)
58
+ - Runtime: Ollama 0.x
59
+ - Prompts: Mixed Italian/English, typical assistant queries
60
+
61
+ > **Recommendation**: Use **Q4_0** as default (3% faster, 82MB smaller, same quality). Use **Q4_K_M** only if you need slightly better coherence in very long conversations (1000+ tokens).
62
+
63
+ ---
64
+
65
+ ## ๐Ÿ› ๏ธ Quick Start with Ollama
66
+
67
+ ### Option 1: Pull from Hugging Face
68
+
69
+ Create a `Modelfile`:
70
+
71
+ ```bash
72
+ cat > Modelfile <<'MODELFILE'
73
+ FROM hf.co/antonio/gemma3-smart-q4/gemma3-1b-q4_0.gguf
74
+
75
+ PARAMETER temperature 0.7
76
+ PARAMETER top_p 0.9
77
+ PARAMETER num_ctx 1024
78
+ PARAMETER num_thread 4
79
+ PARAMETER num_batch 32
80
+ PARAMETER repeat_penalty 1.05
81
+
82
+ SYSTEM """
83
+ You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful. If a task requires internet access or external services, clearly state this and suggest local alternatives when possible.
84
+
85
+ Sei un assistente AI offline che opera su Raspberry Pi. Rileva automaticamente la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile. Se un compito richiede accesso a internet o servizi esterni, indicalo chiaramente e suggerisci alternative locali quando possibile.
86
+ """
87
+ MODELFILE
88
+ ```
89
+
90
+ Then run:
91
+
92
+ ```bash
93
+ ollama create gemma3-smart-q4 -f Modelfile
94
+ ollama run gemma3-smart-q4 "Ciao! Chi sei?"
95
+ ```
96
+
97
+ ### Option 2: Download and Use Locally
98
+
99
+ ```bash
100
+ # Download the model
101
+ wget https://huggingface.co/antonio/gemma3-smart-q4/resolve/main/gemma3-1b-q4_0.gguf
102
+
103
+ # Create Modelfile
104
+ cat > Modelfile <<'MODELFILE'
105
+ FROM ./gemma3-1b-q4_0.gguf
106
+
107
+ PARAMETER temperature 0.7
108
+ PARAMETER top_p 0.9
109
+ PARAMETER num_ctx 1024
110
+ PARAMETER num_thread 4
111
+ PARAMETER num_batch 32
112
+ PARAMETER repeat_penalty 1.05
113
+
114
+ SYSTEM """
115
+ You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful.
116
+
117
+ Sei un assistente AI offline su Raspberry Pi. Rileva la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile.
118
+ """
119
+ MODELFILE
120
+
121
+ # Create and run
122
+ ollama create gemma3-smart-q4 -f Modelfile
123
+ ollama run gemma3-smart-q4 "Hello! Introduce yourself."
124
+ ```
125
+
126
+ ---
127
+
128
+ ## โš™๏ธ Recommended Parameters
129
+
130
+ For **Raspberry Pi 4/5**, use these optimized settings:
131
+
132
+ ```yaml
133
+ Temperature: 0.7 # Balanced creativity vs consistency
134
+ Top-p: 0.9 # Nucleus sampling for diverse responses
135
+ Context Length: 1024 # Optimal for Pi 4 memory
136
+ Threads: 4 # Utilizes all Pi 4 cores
137
+ Batch Size: 32 # Optimized for throughput
138
+ Repeat Penalty: 1.05 # Reduces repetitive outputs
139
+ ```
140
+
141
+ For **faster responses** (e.g., voice assistant), reduce `num_ctx` to `512`.
142
+
143
+ ---
144
+
145
+ ## ๐Ÿ“ฆ Files Included
146
+
147
+ - `gemma3-1b-q4_k_m.gguf` โ€” Q4_K_M quantization (~769 MB) - **Better quality**
148
+ - `gemma3-1b-q4_0.gguf` โ€” Q4_0 quantization (~687 MB) - **Faster speed**
149
+
150
+ ---
151
+
152
+ ## ๐Ÿ”– License & Attribution
153
+
154
+ This is a derivative work of **Google's Gemma 3 1B**.
155
+ Please review and comply with the [Gemma License](https://ai.google.dev/gemma/terms).
156
+
157
+ **Quantization, optimization, and bilingual configuration by Antonio.**
158
+
159
+ ---
160
+
161
+ ## ๐Ÿ”— Links
162
+
163
+ - **GitHub Repository**: [antonio/gemma3-smart-q4](https://github.com/antonio/gemma3-smart-q4) โ€” Code, demos, benchmark scripts
164
+ - **Original Model**: [Google Gemma 3 1B IT](https://huggingface.co/google/gemma-3-1b-it)
165
+ - **Ollama Library**: Coming soon (pending submission)
166
+
167
+ ---
168
+
169
+ ## ๐Ÿš€ Use Cases
170
+
171
+ - **Privacy-focused personal assistant** โ€” All data stays on your device
172
+ - **Offline home automation** โ€” Control IoT devices without cloud dependencies
173
+ - **Educational projects** โ€” Learn AI/ML without expensive hardware
174
+ - **Voice assistants** โ€” Fast enough for real-time speech interaction
175
+ - **Embedded systems** โ€” Industrial applications requiring offline inference
176
+
177
+ ---
178
+
179
+ **Built with โค๏ธ by Antonio ๐Ÿ‡ฎ๐Ÿ‡น**
180
+ *Empowering privacy and edge computing, one model at a time.*
181
+
gemma3-1b-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1d037446a2836db7666aa6ced3ce460b0f7f2ba61c816494a098bb816f2ad55
3
+ size 720425472
gemma3-1b-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c02d2e6f68fd34e9e66dff6a31d3f95fccb6db51f2be0b51f26136a85f7ec1f0
3
+ size 806058240