Spaces:

build-small-hackathon
/

TemperCheck

Running on Zero

App Files Files Community

TemperCheck / CLAUDE.md

Joseph Antolick

Rename Crankycheck -> TemperCheck

aa00509 19 days ago

preview code

Raw

History Blame Contribute Delete

6.64 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## What this is

	TemperCheck — a Gradio app that takes an image or screenshot of a social-media profile and estimates how short-tempered / difficult that person looks to deal with. Built for the Hugging Face Build Small Hackathon (`https://huggingface.co/build-small-hackathon/`).

	This is an image-in → score/verdict-out vision task. The pipeline is: profile image → small Gemma 4 E4B vision model → structured JSON verdict → Gradio UI.

	> Status: scaffolded; ZeroGPU Space deploy is the next step. `app.py` (Gradio UI) → `tempercheck/inference.py` (swappable backend) → `tempercheck/prompt.py` (prompt + defensive JSON parsing). Parsing tests pass. ⚠️ The vision path has NOT been validated yet — local Ollama vision is broken (see below), so the real read only works once deployed to the Space.

	## Hard hackathon constraints (these gate eligibility — do not violate)

	- Model ≤ 32B total parameters. Any model used must be ≤32B. For a vision task that means a small VLM (e.g. a Qwen2.5-VL / SmolVLM / Gemma-vision class model), not a frontier API.
	- Must be a Gradio app hosted as a Hugging Face Space under the hackathon org. Build the UI in Gradio from the start; don't reach for another web framework.
	- Submission needs a demo video + a social-media post. Keep the UI demo-able (clear input, clear output, fast enough to record).
	- Deadline: June 15, 2026 (hack window June 5–15). This is a time-boxed hackathon project — prefer the simplest thing that works end-to-end over architectural polish.

	### Optional bonus "merit badges" (worth steering toward when cheap)
	- Off-Grid — no cloud APIs; run the model locally / on-Space. TemperCheck should aim for this (local VLM inference) since it's a strong fit.
	- Well-Tuned — fine-tuned model. Custom UI — bespoke Gradio styling. Llama.cpp — GGUF inference path. Agent traces shared. Field notes documentation.

	## Environment & commands (this machine)

	Python is `uv`-only on this workstation (no system Python; `pip`/`conda` not installed). Use:
	- `uv run app.py` — run the Gradio app locally (NOT `python app.py`). Defaults to the Ollama backend on port 7140.
	- `TEMPER_BACKEND=transformers uv run app.py` — run the same backend the HF Space uses (`google/gemma-4-E4B-it` via `transformers`). Needs torch/transformers/spaces installed locally (not part of the default local setup).
	- `uv run pytest` — run the parsing tests. Single test: `uv run pytest tests/test_parsing.py::test_clean_json`.
	- `uv add <pkg>` / `uv add --dev <pkg>` — add a runtime / dev dependency.

	GPU: local RTX 5090, 32 GB VRAM (sm_120 / Blackwell) — the E4B model (~8B) fits trivially. The HF Space runs on smaller hardware; E4B is sized for that. Don't bump to a larger Gemma 4 (12B/26B/31B) without checking the target Space tier.

	### Backend selection (the key seam)
	`TEMPER_BACKEND` switches the model path; the UI is identical either way. It
	defaults to `transformers` on a Space (detected via the `SPACE_ID` env) and
	`ollama` locally — so no manual config is needed in either place.
	- `transformers` (the deployed Space — this is where verdicts are real) — loads `google/gemma-4-E4B-it` at module level onto `cuda` and runs generation inside a `@spaces.GPU` function (ZeroGPU). `requirements.txt` carries `spaces`/torch≥2.8/transformers; local dev does not install them.
	- `ollama` (local only) — POSTs to `127.0.0.1:11434`. ⚠️ Local Gemma 4 vision is broken in the current Ollama (0.30.7): the abliterated model hallucinates instead of reading images, and the official `gemma4:e4b` returns a blank. So the Ollama path is only good for UI/plumbing work — it cannot produce a real temper read. Real verdicts require the Space. (See `memory/abliterated-gemma4-vision-broken.md`.)

	### ZeroGPU specifics (the Space)
	- Hardware ZeroGPU is set in the Space settings (not the README). Default size `large` = 48 GB, ample for E4B.
	- `import spaces` must precede `import torch`; the model is placed on `cuda` at module level (CUDA-emulation makes this work at startup) and inference is decorated `@spaces.GPU(duration=90)`. Lazy-loading the model inside the GPU fn is discouraged.
	- `google/gemma-4-E4B-it` is gated — the Space needs an `HF_TOKEN` secret whose account has accepted the Gemma license, or model download fails at boot.

	### Port
	This app binds 7140 (Gradio default 7860 is triple-booked on this machine). Override with `TEMPER_PORT`. Already registered in the global port list.

	## Architecture notes (load-bearing decisions)

	- The VLM lives behind `tempercheck/inference.py`. The rest of the app only calls `score_image(pil_image) -> TemperVerdict` and never imports a backend directly. This is what lets the Ollama ↔ transformers swap (and a future llama.cpp/GGUF path for the Llama.cpp badge) happen without touching the UI.
	- The output contract is JSON, parsed defensively. `prompt.py` asks for `{score, verdict, rationale, signals}`; `parse_verdict` extracts the first balanced JSON object and clamps/falls back on every field so a malformed small-VLM response never raises. If you change the JSON shape, update `parse_verdict`, the UI rendering in `app.py`, and `tests/test_parsing.py` together — that's the riskiest seam and the reason the tests exist (they run with no model).
	- The transformers model loads once at import (module-level, per the ZeroGPU rule), built by `_build_transformers_scorer()`; the returned `@spaces.GPU` scorer is reused per request.
	- Gemma 4 multimodal expects the image before the text in the message content (see `build_messages`).
	- This judges people from a photo — whimsical/novelty framing (good fit for the "Thousand Token Wood" track), not a real assessment. The system prompt and UI both carry a self-aware disclaimer; keep that tone and don't let outputs read as factual judgments of real individuals.

	## Still to do for submission
	- Create the ZeroGPU Space under `build-small-hackathon`, add the `HF_TOKEN` secret (Gemma-licensed account), push, and confirm it boots + reads images. This is the first real test of the vision path — local Ollama can't validate it.
	- If `AutoModelForImageTextToText` doesn't resolve for Gemma 4 on the Space, check the [model card](https://huggingface.co/google/gemma-4-E4B-it) for the exact class.
	- Record the demo video + write the social post (both required submission artifacts).

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## What this is

	TemperCheck — a Gradio app that takes an image or screenshot of a social-media profile and estimates how short-tempered / difficult that person looks to deal with. Built for the Hugging Face Build Small Hackathon (`https://huggingface.co/build-small-hackathon/`).

	This is an image-in → score/verdict-out vision task. The pipeline is: profile image → small Gemma 4 E4B vision model → structured JSON verdict → Gradio UI.

	> Status: scaffolded; ZeroGPU Space deploy is the next step. `app.py` (Gradio UI) → `tempercheck/inference.py` (swappable backend) → `tempercheck/prompt.py` (prompt + defensive JSON parsing). Parsing tests pass. ⚠️ The vision path has NOT been validated yet — local Ollama vision is broken (see below), so the real read only works once deployed to the Space.

	## Hard hackathon constraints (these gate eligibility — do not violate)

	- Model ≤ 32B total parameters. Any model used must be ≤32B. For a vision task that means a small VLM (e.g. a Qwen2.5-VL / SmolVLM / Gemma-vision class model), not a frontier API.
	- Must be a Gradio app hosted as a Hugging Face Space under the hackathon org. Build the UI in Gradio from the start; don't reach for another web framework.
	- Submission needs a demo video + a social-media post. Keep the UI demo-able (clear input, clear output, fast enough to record).
	- Deadline: June 15, 2026 (hack window June 5–15). This is a time-boxed hackathon project — prefer the simplest thing that works end-to-end over architectural polish.

	### Optional bonus "merit badges" (worth steering toward when cheap)
	- Off-Grid — no cloud APIs; run the model locally / on-Space. TemperCheck should aim for this (local VLM inference) since it's a strong fit.
	- Well-Tuned — fine-tuned model. Custom UI — bespoke Gradio styling. Llama.cpp — GGUF inference path. Agent traces shared. Field notes documentation.

	## Environment & commands (this machine)

	Python is `uv`-only on this workstation (no system Python; `pip`/`conda` not installed). Use:
	- `uv run app.py` — run the Gradio app locally (NOT `python app.py`). Defaults to the Ollama backend on port 7140.
	- `TEMPER_BACKEND=transformers uv run app.py` — run the same backend the HF Space uses (`google/gemma-4-E4B-it` via `transformers`). Needs torch/transformers/spaces installed locally (not part of the default local setup).
	- `uv run pytest` — run the parsing tests. Single test: `uv run pytest tests/test_parsing.py::test_clean_json`.
	- `uv add <pkg>` / `uv add --dev <pkg>` — add a runtime / dev dependency.

	GPU: local RTX 5090, 32 GB VRAM (sm_120 / Blackwell) — the E4B model (~8B) fits trivially. The HF Space runs on smaller hardware; E4B is sized for that. Don't bump to a larger Gemma 4 (12B/26B/31B) without checking the target Space tier.

	### Backend selection (the key seam)
	`TEMPER_BACKEND` switches the model path; the UI is identical either way. It
	defaults to `transformers` on a Space (detected via the `SPACE_ID` env) and
	`ollama` locally — so no manual config is needed in either place.
	- `transformers` (the deployed Space — this is where verdicts are real) — loads `google/gemma-4-E4B-it` at module level onto `cuda` and runs generation inside a `@spaces.GPU` function (ZeroGPU). `requirements.txt` carries `spaces`/torch≥2.8/transformers; local dev does not install them.
	- `ollama` (local only) — POSTs to `127.0.0.1:11434`. ⚠️ Local Gemma 4 vision is broken in the current Ollama (0.30.7): the abliterated model hallucinates instead of reading images, and the official `gemma4:e4b` returns a blank. So the Ollama path is only good for UI/plumbing work — it cannot produce a real temper read. Real verdicts require the Space. (See `memory/abliterated-gemma4-vision-broken.md`.)

	### ZeroGPU specifics (the Space)
	- Hardware ZeroGPU is set in the Space settings (not the README). Default size `large` = 48 GB, ample for E4B.
	- `import spaces` must precede `import torch`; the model is placed on `cuda` at module level (CUDA-emulation makes this work at startup) and inference is decorated `@spaces.GPU(duration=90)`. Lazy-loading the model inside the GPU fn is discouraged.
	- `google/gemma-4-E4B-it` is gated — the Space needs an `HF_TOKEN` secret whose account has accepted the Gemma license, or model download fails at boot.

	### Port
	This app binds 7140 (Gradio default 7860 is triple-booked on this machine). Override with `TEMPER_PORT`. Already registered in the global port list.

	## Architecture notes (load-bearing decisions)

	- The VLM lives behind `tempercheck/inference.py`. The rest of the app only calls `score_image(pil_image) -> TemperVerdict` and never imports a backend directly. This is what lets the Ollama ↔ transformers swap (and a future llama.cpp/GGUF path for the Llama.cpp badge) happen without touching the UI.
	- The output contract is JSON, parsed defensively. `prompt.py` asks for `{score, verdict, rationale, signals}`; `parse_verdict` extracts the first balanced JSON object and clamps/falls back on every field so a malformed small-VLM response never raises. If you change the JSON shape, update `parse_verdict`, the UI rendering in `app.py`, and `tests/test_parsing.py` together — that's the riskiest seam and the reason the tests exist (they run with no model).
	- The transformers model loads once at import (module-level, per the ZeroGPU rule), built by `_build_transformers_scorer()`; the returned `@spaces.GPU` scorer is reused per request.
	- Gemma 4 multimodal expects the image before the text in the message content (see `build_messages`).
	- This judges people from a photo — whimsical/novelty framing (good fit for the "Thousand Token Wood" track), not a real assessment. The system prompt and UI both carry a self-aware disclaimer; keep that tone and don't let outputs read as factual judgments of real individuals.

	## Still to do for submission
	- Create the ZeroGPU Space under `build-small-hackathon`, add the `HF_TOKEN` secret (Gemma-licensed account), push, and confirm it boots + reads images. This is the first real test of the vision path — local Ollama can't validate it.
	- If `AutoModelForImageTextToText` doesn't resolve for Gemma 4 on the Space, check the [model card](https://huggingface.co/google/gemma-4-E4B-it) for the exact class.
	- Record the demo video + write the social post (both required submission artifacts).