valentinfrlch commited on
Commit
2653ad0
·
verified ·
1 Parent(s): 6000fd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -3
README.md CHANGED
@@ -1,3 +1,184 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ license_link: https://ai.google.dev/gemma/terms
4
+ base_model: google/gemma-3-4b-it
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
+ tags:
8
+ - vision-language
9
+ - multimodal
10
+ - gemma3
11
+ - home-security
12
+ - smart-home
13
+ - on-device
14
+ language:
15
+ - en
16
+ - de
17
+ - nl
18
+ - fr
19
+ - es
20
+ - pt
21
+ - it
22
+ - pl
23
+ - sv
24
+ extra_gated_heading: Access Glimpse-v1
25
+ extra_gated_description: >-
26
+ Glimpse-v1 is a Model Derivative of Google's Gemma and is distributed under
27
+ the Gemma Terms of Use. By requesting access you agree to those terms,
28
+ including the Gemma Prohibited Use Policy.
29
+ extra_gated_button_content: Acknowledge and access
30
+ ---
31
+
32
+ # Glimpse-v1
33
+
34
+ A lightweight, open vision-language model built to understand and summarize **home security camera events**.
35
+
36
+ ```
37
+
38
+ ollama run llmvision/glimpse-v1
39
+
40
+ ```
41
+
42
+ > Install [Ollama](https://ollama.com), then paste the command above. See the [project site](https://llmvision.org/glimpse/) for documentation.
43
+
44
+ ## Model summary
45
+
46
+ | | |
47
+ |---|---|
48
+ | **Developer** | LLM Vision |
49
+ | **Base model** | `google/gemma-3-4b-pt` |
50
+ | **Architecture** | Gemma 3 (vision-language) |
51
+ | **Parameters** | ~4B |
52
+ | **Modality** | Image + text → text |
53
+ | **Training samples** | 5,000+ real-world home security camera events |
54
+ | **Reported gain** | 1.9× accuracy improvement over the base model on the target task |
55
+ | **License** | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
56
+
57
+ ## Intended use
58
+
59
+ Glimpse-v1 is purpose-built for **summarizing and describing footage from home security cameras** — for example, generating short natural-language descriptions of motion events, deliveries, visitors, pets, or vehicles, locally on consumer hardware.
60
+
61
+ ### Designed for
62
+ - Local, privacy-preserving smart-home automations
63
+ - Event summaries for camera notifications
64
+ - Integrations with home-automation platforms (e.g. Home Assistant via the LLM Vision project)
65
+ - Edge devices and machines with limited VRAM/RAM
66
+
67
+ ### Not designed for
68
+ - General-purpose visual question answering or document understanding
69
+ - Person identification, biometric recognition, or surveillance of identifiable individuals
70
+ - Safety-critical decisions (medical, legal, security response) without human review
71
+ - Use cases prohibited by the [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy)
72
+
73
+ ## Languages
74
+
75
+ English, German, Dutch, French, Spanish, Portuguese, Italian, Polish, Swedish. Additional languages are added regularly — quality varies by language and is best in English.
76
+
77
+ ## Why a small model?
78
+
79
+ Glimpse-v1 is a **compact 4B-parameter** model deliberately sized to run on hardware with limited memory and compute. The goal is **private, local AI for the home**: your camera footage never has to leave your network, and you avoid recurring API costs.
80
+
81
+ ## Performance
82
+
83
+ Glimpse-v1 reports a **1.9× accuracy improvement** over the base Gemma 3 4B model on home-security event summarization. See the [project site](https://llmvision.org/glimpse/) for the latest benchmarks.
84
+
85
+ ## Training
86
+
87
+ - **Base:** Gemma 3 4B (instruction-tuned)
88
+ - **Data:** ~5,000 curated real-world home security camera events spanning diverse scenes, lighting conditions, and event types
89
+ - **Objective:** Supervised fine-tuning for concise, factual event descriptions
90
+
91
+ > Files in this repository have been **modified from the original Gemma 3 release** as part of this fine-tune.
92
+
93
+ ## How to use
94
+
95
+ ### Ollama (recommended)
96
+ ```
97
+
98
+ ollama run llmvision/glimpse-v1
99
+
100
+ ```
101
+
102
+ ### Transformers
103
+ ```
104
+
105
+ from transformers import AutoProcessor, AutoModelForImageTextToText
106
+
107
+ import torch
108
+
109
+ model_id = "<your-hf-username>/glimpse-v1"
110
+
111
+ processor = AutoProcessor.from_pretrained(model_id)
112
+
113
+ model = AutoModelForImageTextToText.from_pretrained(
114
+
115
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
116
+
117
+ )
118
+
119
+ messages = [
120
+
121
+ {"role": "user", "content": [
122
+
123
+ {"type": "image", "url": "path/to/frame.jpg"},
124
+
125
+ {"type": "text", "text": "Summarize this camera event in one sentence."},
126
+
127
+ ]},
128
+
129
+ ]
130
+
131
+ inputs = processor.apply_chat_template(
132
+
133
+ messages, add_generation_prompt=True, tokenize=True,
134
+
135
+ return_dict=True, return_tensors="pt",
136
+
137
+ ).to(model.device)
138
+
139
+ out = model.generate(inputs, max_new_tokens=128)
140
+
141
+ print(processor.decode(out[0], skip_special_tokens=True))
142
+
143
+ ```
144
+
145
+ ## Limitations and risks
146
+
147
+ - **Domain-specific.** Outside of home-security framing, quality drops noticeably.
148
+ - **Hallucination.** Like all VLMs, it can invent details (people, objects, actions) not present in the image. Treat outputs as suggestions, not ground truth.
149
+ - **Bias.** Training data reflects the distribution of available home camera footage and may underperform on under-represented scenes, lighting, or demographics.
150
+ - **Privacy.** Although the model runs locally, **you** are responsible for handling footage of identifiable people in line with local laws (e.g. GDPR).
151
+ - **Not a security system.** Do not use Glimpse-v1 as the sole signal for emergency response.
152
+
153
+ ## License
154
+
155
+ This model is a **Gemma Model Derivative** and is distributed under the [**Gemma Terms of Use**](https://ai.google.dev/gemma/terms). Use, reproduction, modification, and redistribution are subject to those terms, including the [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
156
+
157
+ By downloading or using Glimpse-v1 you agree to the Gemma Terms of Use. If you redistribute Glimpse-v1 or any derivative of it, you must:
158
+ 1. Pass these terms through to your recipients as an enforceable provision.
159
+ 2. Provide recipients a copy of the Gemma Terms of Use.
160
+ 3. Mark any modified files with prominent notices that they have been modified.
161
+ 4. Include a `NOTICE` file containing:
162
+ > Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms.
163
+
164
+ ## Citation
165
+
166
+ ```
167
+
168
+ @misc{glimpse_v1_2026,
169
+
170
+ title = {Glimpse-v1: A compact vision-language model for home security event understanding},
171
+
172
+ author = {LLM Vision},
173
+
174
+ year = {2026},
175
+
176
+ url = {https://llmvision.org/glimpse/}
177
+
178
+ }
179
+
180
+ ```
181
+
182
+ ## Acknowledgements
183
+
184
+ Built on [Google Gemma 3](https://ai.google.dev/gemma). Distributed via [Ollama](https://ollama.com) and Hugging Face.