Instructions to use Laborator/microlens-gemma4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Laborator/microlens-gemma4-e2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Laborator/microlens-gemma4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Laborator/microlens-gemma4-e2b", dtype="auto")

llama-cpp-python

How to use Laborator/microlens-gemma4-e2b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Laborator/microlens-gemma4-e2b",
	filename="gguf/gemma-4-e2b-it.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Laborator/microlens-gemma4-e2b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
./llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Use Docker

docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16

LM Studio
Jan

vLLM

How to use Laborator/microlens-gemma4-e2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Laborator/microlens-gemma4-e2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16

SGLang

How to use Laborator/microlens-gemma4-e2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Laborator/microlens-gemma4-e2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Laborator/microlens-gemma4-e2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use Laborator/microlens-gemma4-e2b with Ollama:
```
ollama run hf.co/Laborator/microlens-gemma4-e2b:BF16
```

Unsloth Studio new

How to use Laborator/microlens-gemma4-e2b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Pi new

How to use Laborator/microlens-gemma4-e2b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Laborator/microlens-gemma4-e2b:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Laborator/microlens-gemma4-e2b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Laborator/microlens-gemma4-e2b:BF16

Run Hermes

hermes

Docker Model Runner
How to use Laborator/microlens-gemma4-e2b with Docker Model Runner:
```
docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16
```

Lemonade

How to use Laborator/microlens-gemma4-e2b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Laborator/microlens-gemma4-e2b:BF16

Run and chat with the model

lemonade run user.microlens-gemma4-e2b-BF16

List all available models

lemonade list

microlens-gemma4-e2b / README.md

Serghei Brinza

Clean dataset: 5 sources, 93,014 triples, 123 genera, 8 categories + DOI table

6152f85 1 day ago

preview code

raw

history blame contribute delete

21 kB

	---
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	tags:
	- gemma
	- gemma-4
	- vision-language
	- microscopy
	- scientific-imaging
	- lora
	- qlora
	- unsloth
	- citizen-science
	- education
	- edge-deployment
	license: apache-2.0
	base_model: unsloth/gemma-4-E2B-it
	model-index:
	- name: MicroLens
	results: []
	---

	# MicroLens · Microscopy Vision-Language Model

	A small, fine-tuned multimodal model that turns a $150 Android phone + a clip-on microscope into a field-ready assistant for diatom-based water-quality assessment, freshwater zooplankton biodiversity, fungal spore identification, and cyanobacterial bloom monitoring. Runs offline.

	- Base model: [`unsloth/gemma-4-E2B-it`](https://huggingface.co/unsloth/gemma-4-E2B-it) (4.65 B params)
	- Adapter / Merged / GGUF: [`Laborator/microlens-gemma4-e2b`](https://huggingface.co/Laborator/microlens-gemma4-e2b)
	- Source code: [`SergheiBrinza/microlens`](https://github.com/SergheiBrinza/microlens)
	- Submission for: The Gemma 4 Good Hackathon, Kaggle · May 2026
	- License: Apache 2.0 (weights · code · dataset, see component licenses below)

	---

	## Table of Contents

	1. [Model Details](#1-model-details)
	2. [Intended Use](#2-intended-use)
	3. [Training Data](#3-training-data)
	4. [Training Procedure](#4-training-procedure)
	5. [Evaluation](#5-evaluation)
	6. [Bias, Risks & Limitations](#6-bias-risks--limitations)
	7. [Environmental Impact](#7-environmental-impact)
	8. [Technical Specifications](#8-technical-specifications)
	9. [How to Use](#9-how-to-use)
	10. [Citation](#10-citation)
	11. [Model Card Authors](#11-model-card-authors)
	12. [Model Card Contact](#12-model-card-contact)

	---

	## 1. Model Details

	\| Field \| Value \|
	\|---\|---\|
	\| Name \| MicroLens \|
	\| Version \| 1.0 · May 2026 \|
	\| Author \| Serghei Brinza · Vienna, Austria \|
	\| Model type \| Vision-Language (image + text → text) \|
	\| Language(s) \| English (primary); multilingual output via Gemma 4 base tokenizer \|
	\| Base model \| `unsloth/gemma-4-E2B-it` (Gemma 4 Effective-2B, instruction-tuned) \|
	\| Parameters \| 4.65 B total · 59.7 M trainable during fine-tune (1.34 %) \|
	\| License \| Apache 2.0 \|
	\| Finetuning method \| Unsloth FastVisionModel + 4-bit QLoRA (LoRA adapter, r = 32, α = 64, dropout = 0.05) \|
	\| Framework \| Unsloth 2026.4.7 · Transformers 5.6.0 · PyTorch 2.10 · CUDA 12.8 \|
	\| Hardware \| 1 × NVIDIA RTX 3090 Ti (24 GB VRAM) \|
	\| Training time \| ~13 h (v3 = 1 epoch on rich format, ~6,200 steps · resumed from v2 checkpoint-18351 · v2 base = 3 epochs, ~37 h) \|
	\| Distilled from \| Internal Teacher VLM (Apache 2.0, thinking mode, 3 × vLLM workers) \|

	### Distribution artefacts

	\| Artefact \| Size \| Purpose \| Target runtime \|
	\|---\|---\|---\|---\|
	\| LoRA adapter \| 228 MB \| Load on top of base Gemma 4 E2B \| Unsloth / PEFT / Transformers \|
	\| Merged FP16 \| 9.5 GB \| Full stand-alone model \| Transformers / vLLM / SGLang \|
	\| GGUF Q4_K_M \| 3.2 GB \| 4-bit quantised weights \| Ollama · llama.cpp · LM Studio \|
	\| BF16 `mmproj` \| 942 MB \| Vision projector for GGUF runtimes \| Ollama · llama.cpp \|

	All artefacts live on the same HF repo: [Laborator/microlens-gemma4-e2b](https://huggingface.co/Laborator/microlens-gemma4-e2b).

	---

	## 2. Intended Use

	MicroLens is built to lower the cost of scientific observation in places where expert knowledge or network access is scarce.

	### Primary intended uses

	- Citizen science. Volunteers contributing to pond-water biodiversity counts, freshwater plankton surveys, or diatom-based water-quality monitoring can capture a smartphone-microscope image and receive a structured natural-language description of the subject and its key visual features.
	- Future of Education. Offline biology / earth-science classes; the model runs on the same Android tablet the students already use, with no cloud call required.
	- Research support. Pre-screening for plankton surveys, diatom-based water-quality monitoring, and fungal spore identification, where the model narrows the candidate set before an expert confirms.
	- Digital equity. The Q4_K_M GGUF build runs on mid-range Android hardware (~$150 phones with 6 GB RAM) via `llama.cpp` / `MLC`. No API key, no telemetry, no internet.

	### Intended users

	- Citizen-science volunteers (amateur botanists, beekeepers, freshwater monitors).
	- Teachers and students in biology / earth-science courses, particularly in low-connectivity regions.
	- Researchers doing preliminary triage of large microscopy datasets.
	- Hackathon / jury members evaluating The Gemma 4 Good Hackathon submission.

	### Out-of-scope uses

	- Medical diagnosis. MicroLens has not been trained on medical imaging (histology, cytology, pathology, radiology). Do not use it to diagnose disease in humans or animals.
	- Legally or biologically authoritative species identification. The model returns descriptions, not court-defensible or taxonomically rigorous identifications.
	- Materials outside the 8 trained categories. Feeding the model an unrelated image (e.g. a face, a landscape, a screenshot) produces an answer but the answer is not grounded in its training and should be treated as unreliable.
	- Forensics, compliance, or regulated decision-making. Do not chain MicroLens into any pipeline where a confident but wrong output can harm a person or violate regulation.

	---

	## 3. Training Data

	### Category distribution

	All samples are microscopy images from 5 open-licensed source datasets (AquaScope · ZooLake · UDE Diatoms · DiatlAS · TgFC ). Total: 93,014 image-question-answer triples (82,737 train · 10,277 validation · 123 genera across 8 categories).

	\| # \| Category \| Train samples \| Typical subjects \| Source datasets \|
	\|---\|---\|---:\|---\|---\|
	\| 1 \| Diatoms \| 64,043 (64.6%) \| Pennate / centric diatoms · genus-level taxonomy \| UDE Diatoms · DiatlAS \|
	\| 2 \| Freshwater zooplankton \| 11,264 (11.4%) \| Cladocerans · copepods · rotifers \| AquaScope · ZooLake \|
	\| 3 \| Fungal spores \| 4,188 (4.2%) \| Conidia · ascospores · basidiospores · spore morphology \| TgFC \|
	\| 4 \| Cyanobacteria \| 1,091 (1.1%) \| Filamentous and unicellular cyanobacteria \| curated subset \|
	\| 5 \| Fish \| 177 (0.2%) \| Fish or fish part (pseudo-genus, category-level only) \| TgFC \|
	\| 6 \| No specimen (service) \| 1,350 (1.4%) \| Background / empty fields for OOD detection \| synthetic negatives \|
	\| 7 \| Debris (service) \| 428 (0.4%) \| Non-biological fragments for OOD handling \| curated \|
	\| 8 \| Unknown (service) \| 196 (0.2%) \| Unidentified microscopy specimens for fallback \| curated \|
	\| \| Total: 82,737 train · 10,277 val · 123 genera (top-30 hand-curated KB) \| \| \| \|

	### Class balancing

	The 8 categories naturally vary in genus density. Long-tail genera (~100 of 123 have fewer than 100 samples each) get category-generic morphology rather than genus-specific cues — this is the correct conservative behaviour given training coverage. The 30 most-common genera have hand-curated knowledge-base entries (morphology · habitat · ID cues from AlgaeBase, WoRMS, ITIS, Round 1990, Krammer-Lange-Bertalot 1986–1991).


	### Description generation pipeline (distillation)

	Natural-language descriptions (`question` → `answer`) were generated from the raw images using Internal Teacher VLM (Apache 2.0) running in thinking mode across 3 × vLLM workers in parallel. For every image:

	1. The teacher sees the image alongside a structured prompt asking for subject identification and key visual features.
	2. The teacher produces a detailed chain-of-thought inside `<think>…</think>` tags, then a concise final answer.
	3. Only the final answer is kept as the training target; the `<think>` trace is discarded.

	This is a teacher-student distillation. The student (MicroLens / Gemma 4 E2B) inherits the teacher's descriptive style while being ~1.7× smaller in parameter count and dramatically smaller after Q4 quantisation.

	### Licensing of training data

	All upstream datasets were checked for license compatibility. Accepted licenses: Apache 2.0, MIT, CC-BY, CC-BY-SA, CC0. Zero samples were used from unlicensed or research-only datasets. The distilled VQA pairs are released under Apache 2.0 alongside the model.

	### Source datasets & DOI

	\| Dataset \| License \| DOI \|
	\|---\|---\|---\|
	\| AquaScope (Eawag) \| CC-BY 4.0 \| 10.25678/0009YP \|
	\| ZooLake (Eawag) \| CC-BY 4.0 \| 10.25678/0004DY \|
	\| UDE Diatoms \| CC-BY 4.0 \| 10.1093/gigascience/giae087 \|
	\| DIATLAS \| CC-BY 4.0 \| 10.5281/zenodo.16260887 \|
	\| TgFC fungal spores \| CC-BY 4.0 \| 10.6084/m9.figshare.28855910 \|


	---

	## 4. Training Procedure

	### Hyperparameters

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Fine-tuning method \| Unsloth FastVisionModel + 4-bit QLoRA (NF4 base + LoRA adapter in bf16) \|
	\| LoRA rank (r) \| 32 \|
	\| LoRA α \| 64 \|
	\| LoRA dropout \| 0.05 \|
	\| Trainable parameters \| 59.7 M (1.34 % of 4.65 B) \|
	\| Target modules \| All linear projections (vision tower + language tower) \|
	\| Optimizer \| AdamW (8-bit) \|
	\| Learning rate \| 5 × 10⁻⁵ (4× softer than v2's 2 × 10⁻⁴ — gentle re-learning of rich format) \|
	\| LR schedule \| Linear warmup (100 steps) → linear decay \|
	\| Batch size (per device) \| 2 \|
	\| Gradient accumulation \| 8 \|
	\| Effective batch size \| 16 \|
	\| Max sequence length \| 2048 tokens \|
	\| Epochs (v3) \| 1 (rich format, resumed from v2 checkpoint-18351) \|
	\| Steps (v3) \| ~6,200 \|
	\| v2 base config \| 3 epochs · lr 2 × 10⁻⁴ · 18,351 steps · ~37 h wall-clock \|
	\| Mixed precision \| bf16 \|
	\| Gradient checkpointing \| enabled during fine-tune (Unsloth's optimised path) \|
	\| Seed \| 3407 \|

	### Hardware & runtime

	- 1 × NVIDIA GeForce RTX 3090 Ti (24 GB GDDR6X) — single GPU only (Unsloth currently does not support multi-GPU)
	- AMD Ryzen host · 64 GB system RAM
	- Ubuntu 24.04 · CUDA 12.8 · PyTorch 2.10
	- Wall-clock v3 (1 epoch rich-format resume): ~13 h
	- Wall-clock v2 (3 epochs base training that v3 resumed from): ~37 h
	- Cumulative wall-clock through full v2 + v3 pipeline: ~50 h

	### Loss curves

	\| Stage \| Split \| Final loss \|
	\|---\|---\|---\|
	\| v2 (3 epochs base) \| Eval (10,277-image holdout) \| ~0.21 \|
	\| v3 (1 epoch rich-format resume) \| Eval (220-image stratified holdout) \| 0.0213 \|

	The v3 step preserves v2's category/genus accuracy (no drift, ~45 % top-1 genus accuracy on 123 classes vs random baseline of 0.7 %) while overwriting the response format prior to produce structured rich answers (genus · morphology · habitat · ID cues). The 4× softer learning rate (5e-5 vs 2e-4) was specifically chosen for this gentle re-learning step — without it, 1 epoch on rich format would degrade the genus signal v2 had already learned.

	### Attention backend

	Gemma 4's vision encoder uses a head dimension of 512, which exceeds the 256-head-dim limit of current FlashAttention-2 kernels. Fine-tuning and inference therefore use PyTorch SDPA (scaled-dot-product attention, memory-efficient path). On RTX 3090 Ti this is the correct default; SDPA is the only supported backend for Gemma 4 in Unsloth 2026.4.7 at the time of training. Unsloth's FastVisionModel adds custom 4-bit QLoRA kernels and `UnslothVisionDataCollator` on top of this backend, which together cut peak VRAM from ~38 GB (vanilla HF Transformers) to ~12 GB and roughly halve the per-step time.

	---

	## 5. Evaluation

	Evaluation is qualitative and per-category, reflecting the spirit of the submission (an assistive descriptor, not a classifier). For every category we sampled images from the held-out validation split and compared the MicroLens answer against the original internal teacher answer.

	\| Category \| Observation \|
	\|---\|---\|
	\| Diatoms \| Pennate vs. centric distinction is reliable; genus-level naming on common Naviculales / Cymbellaceae / Aulacoseiraceae is consistent. Long-tail diatoms degrade gracefully into morphological description (raphe / striae / valve outline). \|
	\| Freshwater zooplankton \| Cladocerans and rotifers are consistently named at family level; common copepod genera (Cyclops, Daphnia, Bosmina) are reliably tagged. \|
	\| Fungal spores \| Conidial vs. ascospore vs. basidiospore separation is reliable; common spore morphologies (Neopestalotiopsis, Colletotrichum, Olivea) receive genus-level naming. \|
	\| Cyanobacteria \| Identified at category level; specific cyanobacterial genus naming is best-effort due to small training share (~1% of total). \|
	\| Fish \| Pseudo-genus class with no species-level annotation in training. Returns category-level templated description rather than species names. \|

	The 3 service classes (`debris`, `no_specimen`, `unknown`) are used for out-of-distribution handling and route the model to conservative, generic responses rather than taxonomic descriptions.

	There is no single accuracy number for MicroLens, because the output space is free-form natural language. The correct axis of evaluation is "does the description help a human in the field decide what to do next?". For the trained categories, it does.

	---

	## 6. Bias, Risks & Limitations

	### Known failure modes

	- Small-model ceiling. MicroLens is built on Gemma 4 E2B (effective 2-billion scale). On edge cases the teacher (Internal Teacher) was stronger; the student inherits the style but not the full capability. Expect the student to be close to, but not equal to, the teacher on hard examples.
	- English-first. Scientific terminology is maximally accurate in English. The Gemma 4 base model is multilingual, so translated output is available, but translations can simplify or partially drop domain terms; always verify critical terms in the English answer.
	- Out-of-distribution images. Photographs that are not microscopy (landscapes, faces, screenshots) will still produce text. That text is not grounded in the training distribution and should not be trusted.

	### Risks

	- Over-trust by non-experts. A fluent natural-language description can feel more authoritative than it is. Treat MicroLens as a first-pass field note, not as an oracle. Verify before publishing, diagnosing, or acting on any output.
	- Distribution shift. The training data is dominated by lab-quality or curated-quality images. Field images taken through cheap clip-on phone microscopes have more motion blur, chromatic aberration, and inconsistent illumination. Descriptions on those inputs remain helpful but are more generic.

	### Ethical considerations

	- Distillation is explicitly disclosed. Training data was generated from Internal Teacher VLM (Apache 2.0). The teacher VLM was used under its Apache 2.0 license.
	- Dataset provenance is audited. Only Apache / MIT / CC-BY / CC-BY-SA / CC0 upstream data was used. Zero non-licensed images were included.
	- No faces, no PII. The training pool contains microscopy subjects only: no human faces, no personally identifiable information, no private medical imaging.

	### Recommended usage pattern

	1. Capture image → 2. MicroLens describes it → 3. Human confirms or rejects → 4. Log both.

	The model produces a first draft. Final decisions stay with the user.

	---

	## 7. Environmental Impact

	MicroLens v3 was trained on a single workstation GPU for ~13 hours (1-epoch rich-format resume from v2 checkpoint-18351). The v2 base run that v3 resumes from took an additional ~37 hours.

	\| Factor \| Value \|
	\|---\|---\|
	\| GPU \| RTX 3090 Ti · ~400 W under sustained fine-tune load \|
	\| CPU + chassis + cooling overhead \| ~140 W \|
	\| Wall-time v3 (1 epoch rich) \| ~13 h \|
	\| Wall-time v2 (3 epochs base) \| ~37 h \|
	\| Cumulative wall-time \| ~50 h \|
	\| Estimated energy (v3 step) \| ~3 kWh \|
	\| Estimated energy (cumulative v2 + v3) \| ~12.5 kWh \|

	At the Austrian 2024 grid carbon intensity (~110 g CO₂ / kWh), the v3 training step emits ~0.3 kg CO₂-equivalent, and the full v2 + v3 pipeline emits ~1.9 kg CO₂-equivalent.

	Inference cost is negligible: the Q4_K_M GGUF build runs on a mid-range Android phone at a few watts. MicroLens is designed so that the cumulative lifetime inference energy per query can be orders of magnitude smaller than a single cloud-inference call to a frontier model.

	---

	## 8. Technical Specifications

	### Architecture

	- Backbone: Gemma 4 (E2B), sparse-attention transformer decoder with an integrated vision encoder stack.
	- Vision encoder: Gemma 4 native vision tower (head dim 512).
	- Fusion: multimodal projector that lifts vision tokens into the language model embedding space (`mmproj` is shipped separately for GGUF runtimes).
	- Positional encoding: inherited from Gemma 4 base.
	- Attention backend: SDPA (scaled dot-product attention) during both fine-tune and inference. FlashAttention-2 is not usable: Gemma 4's vision-tower head dim (512) exceeds the FA-2 kernel limit (256).

	### Adapter layout

	- LoRA rank: 32
	- LoRA α: 64
	- LoRA dropout: 0.05
	- Target modules: all linear projections across both the language and vision sub-networks (attention Q/K/V/O and MLP gate/up/down), enabling multimodal co-adaptation rather than a language-only adapter.
	- Merged adapter size: 228 MB (bf16).

	### Quantisations shipped

	- Merged FP16: full-precision full-model snapshot (9.5 GB), Transformers-native.
	- GGUF Q4_K_M: 4-bit quantised weights via `llama.cpp` convert pipeline (3.2 GB). Pairs with the BF16 `mmproj` (942 MB) for full multimodal inference.
	- LoRA-only (bf16): for users who want to re-merge against a different Gemma 4 E2B base or stack additional adapters.

	### Software dependencies at training time

	- `unsloth == 2026.4.7`
	- `transformers == 5.6.0`
	- `torch == 2.10` (CUDA 12.8)
	- `peft`, `bitsandbytes`, `trl` from Unsloth's pinned resolver.

	---

	## 9. How to Use

	### Transformers (merged FP16)

	```python
	from transformers import AutoProcessor, AutoModelForVision2Seq
	from PIL import Image
	import torch

	model_id = "Laborator/microlens-gemma4-e2b"
	processor = AutoProcessor.from_pretrained(model_id)
	model = AutoModelForVision2Seq.from_pretrained(
	model_id, torch_dtype=torch.bfloat16, device_map="auto"
	)

	image = Image.open("my_microscopy_image.jpg").convert("RGB")
	prompt = "Describe what you see in this microscopy image. Identify the subject and key visual features."

	messages = [{"role": "user", "content": [
	{"type": "image", "image": image},
	{"type": "text", "text": prompt},
	]}]
	input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
	inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

	with torch.inference_mode():
	out = model.generate(**inputs, max_new_tokens=220, temperature=0.3, do_sample=True)
	print(processor.tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
	```

	### Unsloth (LoRA on top of base)

	```python
	from unsloth import FastVisionModel

	model, tokenizer = FastVisionModel.from_pretrained(
	"Laborator/microlens-gemma4-e2b",
	load_in_4bit=True,
	use_gradient_checkpointing=False,
	)
	FastVisionModel.for_inference(model)
	# … same prompting pattern as above.
	```

	### Ollama / llama.cpp (Q4_K_M)

	```bash
	# download microlens-gemma4-e2b-Q4_K_M.gguf and mmproj-bf16.gguf from the HF repo
	ollama create microlens -f Modelfile # see repo for Modelfile
	ollama run microlens "Describe this sample." --image slide_01.jpg
	```

	---

	## 10. Citation

	If you use MicroLens in a publication, project, or downstream model, please cite:

	```bibtex
	@software{brinza_microlens_2026,
	title = {MicroLens: a microscopy vision-language model fine-tuned from Gemma 4 E2B},
	author = {Brinza, Serghei},
	year = {2026},
	month = may,
	publisher = {Hugging Face},
	url = {https://huggingface.co/Laborator/microlens-gemma4-e2b},
	note = {Submission to the Gemma 4 Good Hackathon (Kaggle, May 2026)}
	}
	```

	Upstream works used by MicroLens:

	```bibtex
	@misc{gemma4_2026,
	title = {Gemma 4 Technical Report},
	author = {Google DeepMind},
	year = {2026},
	note = {Base model: unsloth/gemma-4-E2B-it}
	}

	@misc{unsloth_2026,
	title = {Unsloth: faster LLM fine-tuning},
	author = {Daniel Han and Michael Han and Unsloth team},
	year = {2026},
	url = {https://github.com/unslothai/unsloth}
	}

	```

	---

	## 11. Model Card Authors

	- Serghei Brinza · Vienna, Austria · sole author of the model, the training pipeline, and this card.

	---

	## 12. Model Card Contact

	- Hugging Face: [`Laborator/microlens-gemma4-e2b`](https://huggingface.co/Laborator/microlens-gemma4-e2b). Open an issue / discussion on the repo.
	- GitHub: [`SergheiBrinza/microlens`](https://github.com/SergheiBrinza/microlens). Issues, pull requests, dataset corrections welcome.

	---

	MicroLens · built for the Gemma 4 Good Hackathon.