TemperCheck / CLAUDE.md
Joseph Antolick
Rename Crankycheck -> TemperCheck
aa00509
|
Raw
History Blame Contribute Delete
6.64 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What this is

TemperCheck β€” a Gradio app that takes an image or screenshot of a social-media profile and estimates how short-tempered / difficult that person looks to deal with. Built for the Hugging Face Build Small Hackathon (https://huggingface.co/build-small-hackathon/).

This is an image-in β†’ score/verdict-out vision task. The pipeline is: profile image β†’ small Gemma 4 E4B vision model β†’ structured JSON verdict β†’ Gradio UI.

Status: scaffolded; ZeroGPU Space deploy is the next step. app.py (Gradio UI) β†’ tempercheck/inference.py (swappable backend) β†’ tempercheck/prompt.py (prompt + defensive JSON parsing). Parsing tests pass. ⚠️ The vision path has NOT been validated yet β€” local Ollama vision is broken (see below), so the real read only works once deployed to the Space.

Hard hackathon constraints (these gate eligibility β€” do not violate)

  • Model ≀ 32B total parameters. Any model used must be ≀32B. For a vision task that means a small VLM (e.g. a Qwen2.5-VL / SmolVLM / Gemma-vision class model), not a frontier API.
  • Must be a Gradio app hosted as a Hugging Face Space under the hackathon org. Build the UI in Gradio from the start; don't reach for another web framework.
  • Submission needs a demo video + a social-media post. Keep the UI demo-able (clear input, clear output, fast enough to record).
  • Deadline: June 15, 2026 (hack window June 5–15). This is a time-boxed hackathon project β€” prefer the simplest thing that works end-to-end over architectural polish.

Optional bonus "merit badges" (worth steering toward when cheap)

  • Off-Grid β€” no cloud APIs; run the model locally / on-Space. TemperCheck should aim for this (local VLM inference) since it's a strong fit.
  • Well-Tuned β€” fine-tuned model. Custom UI β€” bespoke Gradio styling. Llama.cpp β€” GGUF inference path. Agent traces shared. Field notes documentation.

Environment & commands (this machine)

Python is uv-only on this workstation (no system Python; pip/conda not installed). Use:

  • uv run app.py β€” run the Gradio app locally (NOT python app.py). Defaults to the Ollama backend on port 7140.
  • TEMPER_BACKEND=transformers uv run app.py β€” run the same backend the HF Space uses (google/gemma-4-E4B-it via transformers). Needs torch/transformers/spaces installed locally (not part of the default local setup).
  • uv run pytest β€” run the parsing tests. Single test: uv run pytest tests/test_parsing.py::test_clean_json.
  • uv add <pkg> / uv add --dev <pkg> β€” add a runtime / dev dependency.

GPU: local RTX 5090, 32 GB VRAM (sm_120 / Blackwell) β€” the E4B model (~8B) fits trivially. The HF Space runs on smaller hardware; E4B is sized for that. Don't bump to a larger Gemma 4 (12B/26B/31B) without checking the target Space tier.

Backend selection (the key seam)

TEMPER_BACKEND switches the model path; the UI is identical either way. It defaults to transformers on a Space (detected via the SPACE_ID env) and ollama locally β€” so no manual config is needed in either place.

  • transformers (the deployed Space β€” this is where verdicts are real) β€” loads google/gemma-4-E4B-it at module level onto cuda and runs generation inside a @spaces.GPU function (ZeroGPU). requirements.txt carries spaces/torchβ‰₯2.8/transformers; local dev does not install them.
  • ollama (local only) β€” POSTs to 127.0.0.1:11434. ⚠️ Local Gemma 4 vision is broken in the current Ollama (0.30.7): the abliterated model hallucinates instead of reading images, and the official gemma4:e4b returns a blank. So the Ollama path is only good for UI/plumbing work β€” it cannot produce a real temper read. Real verdicts require the Space. (See memory/abliterated-gemma4-vision-broken.md.)

ZeroGPU specifics (the Space)

  • Hardware ZeroGPU is set in the Space settings (not the README). Default size large = 48 GB, ample for E4B.
  • import spaces must precede import torch; the model is placed on cuda at module level (CUDA-emulation makes this work at startup) and inference is decorated @spaces.GPU(duration=90). Lazy-loading the model inside the GPU fn is discouraged.
  • google/gemma-4-E4B-it is gated β€” the Space needs an HF_TOKEN secret whose account has accepted the Gemma license, or model download fails at boot.

Port

This app binds 7140 (Gradio default 7860 is triple-booked on this machine). Override with TEMPER_PORT. Already registered in the global port list.

Architecture notes (load-bearing decisions)

  • The VLM lives behind tempercheck/inference.py. The rest of the app only calls score_image(pil_image) -> TemperVerdict and never imports a backend directly. This is what lets the Ollama ↔ transformers swap (and a future llama.cpp/GGUF path for the Llama.cpp badge) happen without touching the UI.
  • The output contract is JSON, parsed defensively. prompt.py asks for {score, verdict, rationale, signals}; parse_verdict extracts the first balanced JSON object and clamps/falls back on every field so a malformed small-VLM response never raises. If you change the JSON shape, update parse_verdict, the UI rendering in app.py, and tests/test_parsing.py together β€” that's the riskiest seam and the reason the tests exist (they run with no model).
  • The transformers model loads once at import (module-level, per the ZeroGPU rule), built by _build_transformers_scorer(); the returned @spaces.GPU scorer is reused per request.
  • Gemma 4 multimodal expects the image before the text in the message content (see build_messages).
  • This judges people from a photo β€” whimsical/novelty framing (good fit for the "Thousand Token Wood" track), not a real assessment. The system prompt and UI both carry a self-aware disclaimer; keep that tone and don't let outputs read as factual judgments of real individuals.

Still to do for submission

  • Create the ZeroGPU Space under build-small-hackathon, add the HF_TOKEN secret (Gemma-licensed account), push, and confirm it boots + reads images. This is the first real test of the vision path β€” local Ollama can't validate it.
  • If AutoModelForImageTextToText doesn't resolve for Gemma 4 on the Space, check the model card for the exact class.
  • Record the demo video + write the social post (both required submission artifacts).