Instructions to use Laborator/microlens-gemma4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Laborator/microlens-gemma4-e2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Laborator/microlens-gemma4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Laborator/microlens-gemma4-e2b", dtype="auto")

llama-cpp-python

How to use Laborator/microlens-gemma4-e2b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Laborator/microlens-gemma4-e2b",
	filename="gguf/gemma-4-e2b-it.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Laborator/microlens-gemma4-e2b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
./llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Laborator/microlens-gemma4-e2b:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Laborator/microlens-gemma4-e2b:BF16

Use Docker

docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16

LM Studio
Jan

vLLM

How to use Laborator/microlens-gemma4-e2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Laborator/microlens-gemma4-e2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16

SGLang

How to use Laborator/microlens-gemma4-e2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Laborator/microlens-gemma4-e2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Laborator/microlens-gemma4-e2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laborator/microlens-gemma4-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use Laborator/microlens-gemma4-e2b with Ollama:
```
ollama run hf.co/Laborator/microlens-gemma4-e2b:BF16
```

Unsloth Studio new

How to use Laborator/microlens-gemma4-e2b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Laborator/microlens-gemma4-e2b to start chatting

Pi new

How to use Laborator/microlens-gemma4-e2b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Laborator/microlens-gemma4-e2b:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "microlens-gemma4-e2b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use Laborator/microlens-gemma4-e2b with Docker Model Runner:
```
docker model run hf.co/Laborator/microlens-gemma4-e2b:BF16
```

Lemonade

How to use Laborator/microlens-gemma4-e2b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Laborator/microlens-gemma4-e2b:BF16

Run and chat with the model

lemonade run user.microlens-gemma4-e2b-BF16

List all available models

lemonade list

MicroLens · Microscopy Vision-Language Model

A small, fine-tuned multimodal model that turns a $150 Android phone + a clip-on microscope into a field-ready assistant for pollen surveys, pond-water plankton, mineral identification, plant-disease triage, and more. Runs offline.

Base model: unsloth/gemma-4-E2B-it (4.65 B params)
Adapter / Merged / GGUF: Laborator/microlens-gemma4-e2b
Source code: SergheiBrinza/microlens
Submission for: The Gemma 4 Good Hackathon, Kaggle · May 2026
License: Apache 2.0 (weights · code · dataset, see component licenses below)

Model Details
Intended Use
Training Data
Training Procedure
Evaluation
Bias, Risks & Limitations
Environmental Impact
Technical Specifications
How to Use
Citation
Model Card Authors
Model Card Contact

1. Model Details

Field	Value
Name	MicroLens
Version	1.0 · May 2026
Author	Serghei Brinza · Vienna, Austria
Model type	Vision-Language (image + text → text)
Language(s)	English (primary); multilingual output via Gemma 4 base tokenizer
Base model	`unsloth/gemma-4-E2B-it` (Gemma 4 Effective-2B, instruction-tuned)
Parameters	4.65 B total · 59.7 M trainable during fine-tune (1.34 %)
License	Apache 2.0
Finetuning method	Unsloth FastVisionModel + 4-bit QLoRA (LoRA adapter, r = 32, α = 64, dropout = 0.05)
Framework	Unsloth 2026.4.7 · Transformers 5.6.0 · PyTorch 2.10 · CUDA 12.8
Hardware	1 × NVIDIA RTX 3090 Ti (24 GB VRAM)
Training time	~13 h (v3 = 1 epoch on rich format, ~6,200 steps · resumed from v2 checkpoint-18351 · v2 base = 3 epochs, ~37 h)
Distilled from	Qwen3-VL-8B-AWQ (Apache 2.0, thinking mode, 3 × vLLM workers)

Distribution artefacts

Artefact	Size	Purpose	Target runtime
LoRA adapter	228 MB	Load on top of base Gemma 4 E2B	Unsloth / PEFT / Transformers
Merged FP16	8.7 GB	Full stand-alone model	Transformers / vLLM / SGLang
GGUF Q4_K_M	3.2 GB	4-bit quantised weights	Ollama · llama.cpp · LM Studio
BF16 `mmproj`	942 MB	Vision projector for GGUF runtimes	Ollama · llama.cpp

All artefacts live on the same HF repo: Laborator/microlens-gemma4-e2b.

2. Intended Use

MicroLens is built to lower the cost of scientific observation in places where expert knowledge or network access is scarce.

Primary intended uses

Citizen science. Volunteers contributing to pollen surveys, pond-water biodiversity counts, or amateur mineralogy can capture a smartphone-microscope image and receive a structured natural-language description of the subject and its key visual features.
Education. Offline biology / earth-science classes; the model runs on the same Android tablet the students already use, with no cloud call required.
Research support. Pre-screening for pollen monitoring, zooplankton surveys, and mineral field work, where the model narrows the candidate set before an expert confirms.
Digital equity. The Q4_K_M GGUF build runs on mid-range Android hardware (~$150 phones with 6 GB RAM) via llama.cpp / MLC. No API key, no telemetry, no internet.

Intended users

Citizen-science volunteers (amateur botanists, beekeepers, freshwater monitors).
Teachers and students in biology / earth-science courses, particularly in low-connectivity regions.
Researchers doing preliminary triage of large microscopy datasets.
Hackathon / jury members evaluating The Gemma 4 Good Hackathon submission.

Out-of-scope uses

Medical diagnosis. MicroLens has not been trained on medical imaging (histology, cytology, pathology, radiology). Do not use it to diagnose disease in humans or animals.
Legally or biologically authoritative species identification. The model returns descriptions, not court-defensible or taxonomically rigorous identifications.
Materials outside the 9 trained categories. Feeding the model an unrelated image (e.g. a face, a landscape, a screenshot) produces an answer but the answer is not grounded in its training and should be treated as unreliable.
Forensics, compliance, or regulated decision-making. Do not chain MicroLens into any pipeline where a confident but wrong output can harm a person or violate regulation.

3. Training Data

Category distribution

All samples are microscopy images from 6 open-licensed source datasets (AquaScope · ZooLake · UDE Diatoms · DiatlAS · TgFC · Marine zooplankton dataset). Total: 122,399 image-question-answer triples (99,215 train · 12,331 validation · 12,353 test · 1,500 negative-class · 146 genera across 9 categories).

#	Category	Typical subjects	Source datasets
1	Diatoms	Pennate / centric diatoms · genus-level taxonomy	UDE Diatoms · DiatlAS
2	Freshwater zooplankton	Cladocerans · copepods · rotifers	AquaScope · ZooLake
3	Marine zooplankton	Copepods · larvae · medusae · marine crustaceans	Marine zooplankton dataset
4	Fungal spores	Conidia · ascospores · basidiospores · spore morphology	curated subset
5	Fish larvae	Early-stage fish larvae and pre-larval forms	TgFC
6	Pollen	Grass · tree · flower pollen grains · aperture morphology	curated subset
7	Minerals	Thin-section petrographic slides · crystal habit	curated subset
8	Plant disease	Leaf lesions · phytopathogen morphology · chlorosis / necrosis	curated subset
9	Snowflakes	Macro / microphotographed snow crystals · dendrite / plate / column	curated subset
	Total: 99,215 train · 12,331 val · 146 genera (top-30 hand-curated KB)

Class balancing & negative-class

The 9 categories naturally vary in genus density. Long-tail genera (~100 of 146 have fewer than 100 samples each) get category-generic morphology rather than genus-specific cues — this is the correct conservative behaviour given training coverage. The 30 most-common genera have hand-curated knowledge-base entries (morphology · habitat · ID cues from AlgaeBase, WoRMS, ITIS, Round 1990, Krammer-Lange-Bertalot 1986–1991).

A synthetic negative-class of 1,500 non-microscopy images (faces · landscapes · screenshots) was added so the model learns to refuse out-of-distribution inputs at inference time.

Description generation pipeline (distillation)

Natural-language descriptions (question → answer) were generated from the raw images using Qwen3-VL-8B-AWQ (Apache 2.0) running in thinking mode across 3 × vLLM workers in parallel. For every image:

The teacher sees the image alongside a structured prompt asking for subject identification and key visual features.
The teacher produces a detailed chain-of-thought inside <think>…</think> tags, then a concise final answer.
Only the final answer is kept as the training target; the <think> trace is discarded.

This is a teacher-student distillation. The student (MicroLens / Gemma 4 E2B) inherits the teacher's descriptive style while being ~1.7× smaller in parameter count and dramatically smaller after Q4 quantisation.

Licensing of training data

All upstream datasets were checked for license compatibility. Accepted licenses: Apache 2.0, MIT, CC-BY, CC-BY-SA, CC0. Zero samples were used from unlicensed or research-only datasets. The distilled VQA pairs are released under Apache 2.0 alongside the model.

4. Training Procedure

Hyperparameters

Hyperparameter	Value
Fine-tuning method	Unsloth FastVisionModel + 4-bit QLoRA (NF4 base + LoRA adapter in bf16)
LoRA rank (r)	32
LoRA α	64
LoRA dropout	0.05
Trainable parameters	59.7 M (1.34 % of 4.65 B)
Target modules	All linear projections (vision tower + language tower)
Optimizer	AdamW (8-bit)
Learning rate	5 × 10⁻⁵ (4× softer than v2's 2 × 10⁻⁴ — gentle re-learning of rich format)
LR schedule	Linear warmup (100 steps) → linear decay
Batch size (per device)	2
Gradient accumulation	8
Effective batch size	16
Max sequence length	2048 tokens
Epochs (v3)	1 (rich format, resumed from v2 checkpoint-18351)
Steps (v3)	~6,200
v2 base config	3 epochs · lr 2 × 10⁻⁴ · 18,351 steps · ~37 h wall-clock
Mixed precision	bf16
Gradient checkpointing	enabled during fine-tune (Unsloth's optimised path)
Seed	3407

Hardware & runtime

1 × NVIDIA GeForce RTX 3090 Ti (24 GB GDDR6X) — single GPU only (Unsloth currently does not support multi-GPU)
AMD Ryzen host · 64 GB system RAM
Ubuntu 24.04 · CUDA 12.8 · PyTorch 2.10
Wall-clock v3 (1 epoch rich-format resume): ~13 h
Wall-clock v2 (3 epochs base training that v3 resumed from): ~37 h
Cumulative wall-clock through full v2 + v3 pipeline: ~50 h

Loss curves

Stage	Split	Final loss
v2 (3 epochs base)	Eval (12,331-image holdout)	~0.21
v3 (1 epoch rich-format resume)	Eval (220-image stratified holdout)	0.0213

The v3 step preserves v2's category/genus accuracy (no drift, ~45 % top-1 genus accuracy on 146 classes vs random baseline of 0.7 %) while overwriting the response format prior to produce structured rich answers (genus · morphology · habitat · ID cues). The 4× softer learning rate (5e-5 vs 2e-4) was specifically chosen for this gentle re-learning step — without it, 1 epoch on rich format would degrade the genus signal v2 had already learned.

Attention backend

Gemma 4's vision encoder uses a head dimension of 512, which exceeds the 256-head-dim limit of current FlashAttention-2 kernels. Fine-tuning and inference therefore use PyTorch SDPA (scaled-dot-product attention, memory-efficient path). On RTX 3090 Ti this is the correct default; SDPA is the only supported backend for Gemma 4 in Unsloth 2026.4.7 at the time of training. Unsloth's FastVisionModel adds custom 4-bit QLoRA kernels and UnslothVisionDataCollator on top of this backend, which together cut peak VRAM from ~38 GB (vanilla HF Transformers) to ~12 GB and roughly halve the per-step time.

5. Evaluation

Evaluation is qualitative and per-category, reflecting the spirit of the submission (an assistive descriptor, not a classifier). For every category we sampled images from the held-out validation split and compared the MicroLens answer against the original Qwen3-VL-8B teacher answer.

Category	Observation
Pollen	Consistent identification of pollen vs. non-pollen. Species-level guesses degrade gracefully into morphological descriptions (shape, aperture, surface texture).
Algae	Separates filamentous vs. unicellular vs. colonial. Genus-level names are best-effort.
Yeast	Reliable identification of budding cells; distinguishes yeast from bacteria.
Minerals	Good at gross texture (crystalline, granular, foliated) and colour; specific mineral names can be off when the sample lacks diagnostic features visible in brightfield.
Plant disease	Strong on lesion descriptions (chlorosis, necrosis, spotting); pathogen identification is probabilistic.
PCB	Identifies trace patterns, solder joints, component silhouettes. Not intended for defect triage; it describes rather than grades.
Snowflakes	Dendrite / plate / column classification is reliable; novel crystal habits are described morphologically.
Zooplankton	Copepods, rotifers, and common cladocerans are consistently named. Rare subclasses degrade gracefully (see limitations).
Diatoms	Pennate vs. centric distinction is reliable; genus-level naming on common Naviculales / Cymbellaceae / Aulacoseiraceae is consistent. Long-tail diatoms degrade gracefully into morphological description (raphe / striae / valve outline).

There is no single accuracy number for MicroLens, because the output space is free-form natural language. The correct axis of evaluation is "does the description help a human in the field decide what to do next?". For the trained categories, it does.

6. Bias, Risks & Limitations

Known failure modes

Graceful degradation on rare zooplankton subclasses. Specimens from Branchiopoda, Decapoda, and other sparsely represented orders are typically described as "marine zooplankton" or "crustacean-like organism" rather than named at the correct taxonomic level. This is the correct conservative behaviour given training coverage; the model does not fabricate taxonomy it cannot defend.
Small-model ceiling. MicroLens is built on Gemma 4 E2B (effective 2-billion scale). On edge cases the teacher (Qwen3-VL-8B) was stronger; the student inherits the style but not the full capability. Expect the student to be close to, but not equal to, the teacher on hard examples.
English-first. Scientific terminology is maximally accurate in English. The Gemma 4 base model is multilingual, so translated output is available, but translations can simplify or partially drop domain terms; always verify critical terms in the English answer.
Out-of-distribution images. Photographs that are not microscopy (landscapes, faces, screenshots) will still produce text. That text is not grounded in the training distribution and should not be trusted.

Risks

Over-trust by non-experts. A fluent natural-language description can feel more authoritative than it is. Treat MicroLens as a first-pass field note, not as an oracle. Verify before publishing, diagnosing, or acting on any output.
Distribution shift. The training data is dominated by lab-quality or curated-quality images. Field images taken through cheap clip-on phone microscopes have more motion blur, chromatic aberration, and inconsistent illumination. Descriptions on those inputs remain helpful but are more generic.

Ethical considerations

Distillation is explicitly disclosed. Training data was generated from Qwen3-VL-8B-AWQ (Apache 2.0). Qwen's license permits this; Qwen is credited in the Citation section.
Dataset provenance is audited. Only Apache / MIT / CC-BY / CC-BY-SA / CC0 upstream data was used. Zero non-licensed images were included.
No faces, no PII. The training pool contains microscopy subjects only: no human faces, no personally identifiable information, no private medical imaging.

Recommended usage pattern

Capture image → 2. MicroLens describes it → 3. Human confirms or rejects → 4. Log both.

The model produces a first draft. Final decisions stay with the user.

7. Environmental Impact

MicroLens v3 was trained on a single workstation GPU for ~13 hours (1-epoch rich-format resume from v2 checkpoint-18351). The v2 base run that v3 resumes from took an additional ~37 hours.

Factor	Value
GPU	RTX 3090 Ti · ~400 W under sustained fine-tune load
CPU + chassis + cooling overhead	~140 W
Wall-time v3 (1 epoch rich)	~13 h
Wall-time v2 (3 epochs base)	~37 h
Cumulative wall-time	~50 h
Estimated energy (v3 step)	~3 kWh
Estimated energy (cumulative v2 + v3)	~12.5 kWh

At the Austrian 2024 grid carbon intensity (110 g CO₂ / kWh), the v3 training step emits ~0.3 kg CO₂-equivalent, and the full v2 + v3 pipeline emits **1.9 kg CO₂-equivalent**.

Inference cost is negligible: the Q4_K_M GGUF build runs on a mid-range Android phone at a few watts. MicroLens is designed so that the cumulative lifetime inference energy per query can be orders of magnitude smaller than a single cloud-inference call to a frontier model.

8. Technical Specifications

Architecture

Backbone: Gemma 4 (E2B), sparse-attention transformer decoder with an integrated vision encoder stack.
Vision encoder: Gemma 4 native vision tower (head dim 512).
Fusion: multimodal projector that lifts vision tokens into the language model embedding space (mmproj is shipped separately for GGUF runtimes).
Positional encoding: inherited from Gemma 4 base.
Attention backend: SDPA (scaled dot-product attention) during both fine-tune and inference. FlashAttention-2 is not usable: Gemma 4's vision-tower head dim (512) exceeds the FA-2 kernel limit (256).

Adapter layout

LoRA rank: 32
LoRA α: 64
LoRA dropout: 0.05
Target modules: all linear projections across both the language and vision sub-networks (attention Q/K/V/O and MLP gate/up/down), enabling multimodal co-adaptation rather than a language-only adapter.
Merged adapter size: 228 MB (bf16).

Quantisations shipped

Merged FP16: full-precision full-model snapshot (8.7 GB), Transformers-native.
GGUF Q4_K_M: 4-bit quantised weights via llama.cpp convert pipeline (3.2 GB). Pairs with the BF16 mmproj (942 MB) for full multimodal inference.
LoRA-only (bf16): for users who want to re-merge against a different Gemma 4 E2B base or stack additional adapters.

Software dependencies at training time

unsloth == 2026.4.7
transformers == 5.6.0
torch == 2.10 (CUDA 12.8)
peft, bitsandbytes, trl from Unsloth's pinned resolver.

9. How to Use

Transformers (merged FP16)

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import torch

model_id = "Laborator/microlens-gemma4-e2b"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

image = Image.open("my_microscopy_image.jpg").convert("RGB")
prompt = "Describe what you see in this microscopy image. Identify the subject and key visual features."

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text",  "text":  prompt},
]}]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=220, temperature=0.3, do_sample=True)
print(processor.tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Unsloth (LoRA on top of base)

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Laborator/microlens-gemma4-e2b",
    load_in_4bit=True,
    use_gradient_checkpointing=False,
)
FastVisionModel.for_inference(model)
# … same prompting pattern as above.

Ollama / llama.cpp (Q4_K_M)

# download microlens-gemma4-e2b-Q4_K_M.gguf and mmproj-bf16.gguf from the HF repo
ollama create microlens -f Modelfile        # see repo for Modelfile
ollama run microlens "Describe this sample." --image slide_01.jpg

10. Citation

If you use MicroLens in a publication, project, or downstream model, please cite:

@software{brinza_microlens_2026,
  title        = {MicroLens: a microscopy vision-language model fine-tuned from Gemma 4 E2B},
  author       = {Brinza, Serghei},
  year         = {2026},
  month        = may,
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/Laborator/microlens-gemma4-e2b},
  note         = {Submission to the Gemma 4 Good Hackathon (Kaggle, May 2026)}
}

Upstream works used by MicroLens:

@misc{gemma4_2026,
  title        = {Gemma 4 Technical Report},
  author       = {Google DeepMind},
  year         = {2026},
  note         = {Base model: unsloth/gemma-4-E2B-it}
}

@misc{unsloth_2026,
  title        = {Unsloth: 2x faster LLM fine-tuning},
  author       = {Daniel Han and Michael Han and Unsloth team},
  year         = {2026},
  url          = {https://github.com/unslothai/unsloth}
}

@misc{qwen3_vl_2025,
  title        = {Qwen3-VL: Vision-Language Models},
  author       = {Alibaba Qwen Team},
  year         = {2025},
  note         = {Teacher model for distillation, Apache 2.0}
}

11. Model Card Authors

Serghei Brinza · Vienna, Austria · sole author of the model, the training pipeline, and this card.

12. Model Card Contact

Hugging Face: Laborator/microlens-gemma4-e2b. Open an issue / discussion on the repo.
GitHub: SergheiBrinza/microlens. Issues, pull requests, dataset corrections welcome.

MicroLens · built for the Gemma 4 Good Hackathon.

Downloads last month: 2,508

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

4-bit

View +2 variants

Model tree for Laborator/microlens-gemma4-e2b

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

unsloth/gemma-4-E2B-it

Adapter

(13)

this model

Laborator
/

microlens-gemma4-e2b

MicroLens · Microscopy Vision-Language Model

Table of Contents

1. Model Details

Distribution artefacts

2. Intended Use

Primary intended uses

Intended users

Out-of-scope uses

3. Training Data

Category distribution

Class balancing & negative-class

Description generation pipeline (distillation)

Licensing of training data

4. Training Procedure

Hyperparameters

Hardware & runtime

Loss curves

Attention backend

5. Evaluation

6. Bias, Risks & Limitations

Known failure modes

Risks

Ethical considerations

Recommended usage pattern

7. Environmental Impact

8. Technical Specifications

Architecture

Adapter layout

Quantisations shipped

Software dependencies at training time

9. How to Use

Transformers (merged FP16)

Unsloth (LoRA on top of base)

Ollama / llama.cpp (Q4_K_M)

10. Citation

11. Model Card Authors

12. Model Card Contact

Model tree for Laborator/microlens-gemma4-e2b

Space using Laborator/microlens-gemma4-e2b 1