Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
What this is
TemperCheck β a Gradio app that takes an image or screenshot of a social-media profile and estimates how short-tempered / difficult that person looks to deal with. Built for the Hugging Face Build Small Hackathon (https://huggingface.co/build-small-hackathon/).
This is an image-in β score/verdict-out vision task. The pipeline is: profile image β small Gemma 4 E4B vision model β structured JSON verdict β Gradio UI.
Status: scaffolded; ZeroGPU Space deploy is the next step.
app.py(Gradio UI) βtempercheck/inference.py(swappable backend) βtempercheck/prompt.py(prompt + defensive JSON parsing). Parsing tests pass. β οΈ The vision path has NOT been validated yet β local Ollama vision is broken (see below), so the real read only works once deployed to the Space.
Hard hackathon constraints (these gate eligibility β do not violate)
- Model β€ 32B total parameters. Any model used must be β€32B. For a vision task that means a small VLM (e.g. a Qwen2.5-VL / SmolVLM / Gemma-vision class model), not a frontier API.
- Must be a Gradio app hosted as a Hugging Face Space under the hackathon org. Build the UI in Gradio from the start; don't reach for another web framework.
- Submission needs a demo video + a social-media post. Keep the UI demo-able (clear input, clear output, fast enough to record).
- Deadline: June 15, 2026 (hack window June 5β15). This is a time-boxed hackathon project β prefer the simplest thing that works end-to-end over architectural polish.
Optional bonus "merit badges" (worth steering toward when cheap)
- Off-Grid β no cloud APIs; run the model locally / on-Space. TemperCheck should aim for this (local VLM inference) since it's a strong fit.
- Well-Tuned β fine-tuned model. Custom UI β bespoke Gradio styling. Llama.cpp β GGUF inference path. Agent traces shared. Field notes documentation.
Environment & commands (this machine)
Python is uv-only on this workstation (no system Python; pip/conda not installed). Use:
uv run app.pyβ run the Gradio app locally (NOTpython app.py). Defaults to the Ollama backend on port 7140.TEMPER_BACKEND=transformers uv run app.pyβ run the same backend the HF Space uses (google/gemma-4-E4B-itviatransformers). Needs torch/transformers/spaces installed locally (not part of the default local setup).uv run pytestβ run the parsing tests. Single test:uv run pytest tests/test_parsing.py::test_clean_json.uv add <pkg>/uv add --dev <pkg>β add a runtime / dev dependency.
GPU: local RTX 5090, 32 GB VRAM (sm_120 / Blackwell) β the E4B model (~8B) fits trivially. The HF Space runs on smaller hardware; E4B is sized for that. Don't bump to a larger Gemma 4 (12B/26B/31B) without checking the target Space tier.
Backend selection (the key seam)
TEMPER_BACKEND switches the model path; the UI is identical either way. It
defaults to transformers on a Space (detected via the SPACE_ID env) and
ollama locally β so no manual config is needed in either place.
transformers(the deployed Space β this is where verdicts are real) β loadsgoogle/gemma-4-E4B-itat module level ontocudaand runs generation inside a@spaces.GPUfunction (ZeroGPU).requirements.txtcarriesspaces/torchβ₯2.8/transformers; local dev does not install them.ollama(local only) β POSTs to127.0.0.1:11434. β οΈ Local Gemma 4 vision is broken in the current Ollama (0.30.7): the abliterated model hallucinates instead of reading images, and the officialgemma4:e4breturns a blank. So the Ollama path is only good for UI/plumbing work β it cannot produce a real temper read. Real verdicts require the Space. (Seememory/abliterated-gemma4-vision-broken.md.)
ZeroGPU specifics (the Space)
- Hardware ZeroGPU is set in the Space settings (not the README). Default size
large= 48 GB, ample for E4B. import spacesmust precedeimport torch; the model is placed oncudaat module level (CUDA-emulation makes this work at startup) and inference is decorated@spaces.GPU(duration=90). Lazy-loading the model inside the GPU fn is discouraged.google/gemma-4-E4B-itis gated β the Space needs anHF_TOKENsecret whose account has accepted the Gemma license, or model download fails at boot.
Port
This app binds 7140 (Gradio default 7860 is triple-booked on this machine). Override with TEMPER_PORT. Already registered in the global port list.
Architecture notes (load-bearing decisions)
- The VLM lives behind
tempercheck/inference.py. The rest of the app only callsscore_image(pil_image) -> TemperVerdictand never imports a backend directly. This is what lets the Ollama β transformers swap (and a future llama.cpp/GGUF path for the Llama.cpp badge) happen without touching the UI. - The output contract is JSON, parsed defensively.
prompt.pyasks for{score, verdict, rationale, signals};parse_verdictextracts the first balanced JSON object and clamps/falls back on every field so a malformed small-VLM response never raises. If you change the JSON shape, updateparse_verdict, the UI rendering inapp.py, andtests/test_parsing.pytogether β that's the riskiest seam and the reason the tests exist (they run with no model). - The transformers model loads once at import (module-level, per the ZeroGPU rule), built by
_build_transformers_scorer(); the returned@spaces.GPUscorer is reused per request. - Gemma 4 multimodal expects the image before the text in the message content (see
build_messages). - This judges people from a photo β whimsical/novelty framing (good fit for the "Thousand Token Wood" track), not a real assessment. The system prompt and UI both carry a self-aware disclaimer; keep that tone and don't let outputs read as factual judgments of real individuals.
Still to do for submission
- Create the ZeroGPU Space under
build-small-hackathon, add theHF_TOKENsecret (Gemma-licensed account), push, and confirm it boots + reads images. This is the first real test of the vision path β local Ollama can't validate it. - If
AutoModelForImageTextToTextdoesn't resolve for Gemma 4 on the Space, check the model card for the exact class. - Record the demo video + write the social post (both required submission artifacts).