Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
AGENTS.md
Operating manual for coding agents working in this repo.
What this is
Hackathon Advisor is a Gradio gradio.Server (FastAPI subclass) Space for the
Build Small Hackathon. It is a small-model (β€32B, largest single
model β€4B) originality coach: it crawls the public build-small-hackathon org into a live project atlas, then lets a
builder search the field and open The Unwritten Almanac advisor to test an idea against existing work.
The engine in hackathon_advisor/ is UI-agnostic; app.py and static/ are one possible front door.
Model stack (all open-weight, all local):
| Role | Model | Runtime |
|---|---|---|
| Advisor brain (tool planning) | openbmb/MiniCPM5-1B + advisor LoRA |
Transformers + PEFT, ZeroGPU |
| Quest classifier | openbmb/MiniCPM5-1B + quest LoRA |
Transformers + PEFT, ZeroGPU |
| Retrieval / atlas | ggml-org/embeddinggemma-300m-qat-q8_0-GGUF |
llama.cpp (llama-cpp-python) |
| Voice input (ASR) | nvidia/nemotron-speech-streaming-en-0.6b |
NVIDIA NeMo |
Setup & commands
- Python
>=3.11,<3.13. Dependency manager is uv (uv.lockis the source of truth). - System packages (
packages.txt):ffmpeg,libsndfile1.
uv sync # or: pip install -r requirements.txt
uv run pytest # run the test suite (fast, NO GPU/weights needed β heavy models are mocked)
uvx ruff check . # lint (config: pyproject.toml [tool.ruff], line-length 100, py311; ruff is not a pinned dep)
uvx ruff format . # format
Run the app locally (greedy CPU/MPS path, no ZeroGPU):
mkdir -p .cache/advisor-dashboard
ADVISOR_CACHE_DIR=.cache/advisor-dashboard \
ADVISOR_MODEL_BACKEND=minicpm-transformers \
ADVISOR_QUEST_ANALYZER_BACKEND=minicpm-transformers \
python app.py # β http://127.0.0.1:7860
ADVISOR_MODEL_BACKEND=rules swaps the LLM for a deterministic planner β use it for UI/plumbing work without loading
MiniCPM.
pytest config lives in pyproject.toml (testpaths=["tests"], pythonpath=["."]). Always run it before
committing β there are 26 test files and they are the contract.
Repo map
app.py gr.Server entry: static UI + FastAPI /api/* + @app.api() client endpoints + refresh scheduler
hackathon_advisor/ the engine package (UI-agnostic β keep it that way)
static/ bespoke frontend (index.html / app.js / styles.css) β the Off-Brand custom UI
scripts/ offline pipelines (crawl, Modal index/LoRA build, Hub publish) β NOT runtime
data/ checked-in snapshots: projects.json, project_index.json, sample_trace.jsonl, quest dataset
artifacts/quest-lora/ local quest-LoRA training output (gitignored; loaded from the Hub repo at runtime)
docs/ build reports (e.g. quest-classification-lora.md)
tests/ pytest suite (mirrors module names: test_<module>.py)
Engine package (hackathon_advisor/)
| Module | Responsibility |
|---|---|
agent.py |
AdvisorEngine.turn() / turn_stream(). One LLM tool-pick per turn, then deterministic Python orchestration (search β whitespace β score β plan). Advisor prose is built from f-string templates here, not by the model. |
model_runtime.py |
ToolPlanner backends. create_tool_planner() selects via ADVISOR_MODEL_BACKEND: minicpm-transformers (MiniCPM5-1B + advisor LoRA, device ladder auto/CUDA β MPS β CPU) or rules (RuleBasedPlanner). |
tool_contracts.py |
TOOL_SPECS typed schema; parse_xml_tool_call(); resolve_tool_call() returns valid or a defaulted call (the tool-call degradation ladder). |
tools.py |
Tool implementations over ProjectIndex (search, whitespace, score, plan, profile, β¦). Heavy logic lives here, not in the model. |
aliases.py |
Jargon normalization (fuzzy-maps "neutron" β Nemotron, "mini cpm" β MiniCPM5, β¦) applied before tool routing. |
data.py |
ProjectIndex: loads the snapshot + embedding index, _embed_query() via llama.cpp, cosine search. |
llama_embedding.py |
LlamaCppEmbedder β EmbeddingGemma GGUF through llama-cpp-python (the Llama Champion path). |
dashboard.py / dashboard_storage.py / dashboard_search.py |
Atlas payload (t-SNE / KMeans / nearest links), BM25 search, and the refresh lease + heartbeat + atomic latest.json swap. |
quest_analysis.py / quest_taxonomy.py / quest_cache.py |
MiniCPM quest LoRA β strict quest JSON; the taxonomy; per-project cache keyed on prompt/taxonomy/model/adapter hashes. |
scoring.py |
Deterministic idea rubric (the model only triggers + verbalizes it). |
wood_map.py / png_export.py |
PCA projection + Pillow render of the shareable page PNG. |
field_notes.py / chapter.py / trace_export.py / submission_packet.py / artifact_bundle.py / demo_rehearsal.py |
Export surfaces (notes, chapter, agent trace, submission packet, demo bundle). |
prize_ledger.py |
Model stack + parameter budget + badge ledger reported at /api/prize-ledger. |
zerogpu.py |
gpu_task() decorator (no-op unless ADVISOR_ZERO_GPU=1) + GPU-quota error detection for the CPU fallback. |
runtime_hooks.py / profiling.py |
Process/runtime helpers and turn profiling. |
Routes (app.py)
First-party FastAPI routes power the visible app; @app.api() endpoints stay available for Gradio/Python clients.
| Route | Purpose |
|---|---|
GET / , GET /static/{path} |
Serve the bespoke static/ frontend |
POST /api/agent-turn |
The advisor turn β NDJSON stream; this is the @spaces.GPU boundary |
POST /api/transcribe |
Voice note β transcript (NeMo, see ASR gotcha) |
GET /api/dashboard Β· GET /api/dashboard/search |
Atlas payload Β· BM25 search |
POST/GET /api/dashboard/refresh |
Start / poll one background refresh job |
GET /api/bootstrap Β· GET /api/runtime Β· GET /api/prize-ledger Β· GET /api/tool-contracts |
Frontend bootstrap, runtime status, prize ledger, tool schema |
GET /api/demo-bundle.zip Β· GET /api/lora-training-kit.zip Β· POST /api/artifact.png Β· POST /api/field-notes Β· POST /api/chapter |
Exports |
GET /health |
Liveness |
Gotchas (the things that bite agents here)
- The 1B model only emits ONE XML tool call per turn. All user-facing prose is templated Python (
agent.py_*_response), and multi-step flows are orchestrated in code β not a model-driven ReAct loop. Do not "make the model write the response" or add multi-hop tool loops; route throughtool_contracts.pyinstead. - Off the Grid is a hard constraint. No proprietary cloud inference API may touch the runtime path. All three
engines run locally from open weights. Don't add
InferenceClient,openai, etc. to runtime code. - Parameter budget. Total β€32B, largest single model β€4B (Tiny Titan). Don't introduce a larger model;
prize_ledger.pydocuments the ~1.98B stack. - MiniCPM (PyTorch) and llama.cpp clash on OpenMP. Query embedding runs in a worker subprocess on macOS, and dashboard refresh builds the GGUF index in a subprocess before returning to the MiniCPM process. Keep these isolated; don't import both heavy runtimes into the same hot path.
- Decoding is greedy.
enable_thinking=False,temperature=0for tool calls and strict quest JSON. Keep tool schemas small and single-hop (1B discipline). - Never write
latest.jsondirectly. Refreshes writeruns/{run_id}/β¦then do an atomic swap under$ADVISOR_CACHE_DIR/refresh.lockwith a heartbeat; a failed run leaves the last validated dashboard in place. - Tests must stay GPU-free. The suite mocks torch/transformers/llama.cpp β
pytestruns with no GPU and no model weights. Don't add module-top heavy imports that break CPU-only test collection. - ASR backend.
asr_runtime.pyrequires NVIDIA NeMo ASR fornvidia/nemotron-speech-streaming-en-0.6b; missing NeMo is a hard runtime error, locally and on the deployed Space.status()reports the configured Nemotron backend.
Offline pipelines (scripts/, build-time only)
Runtime never calls these β they keep the Space self-contained.
python scripts/crawl_hf_spaces.py --org build-small-hackathon --out data/projects.json # crawl the field
python scripts/build_project_index.py --projects data/projects.json --out data/project_index.json # local llama.cpp index
python scripts/build_project_index.py --location modal ... # same build, on Modal (one CLI, --location switches where it runs)
modal run scripts/modal_train_quest_lora.py ... # train the quest LoRA on Modal
python scripts/publish_quest_adapter.py ... / publish_quest_dataset.py ... # push adapter / dataset to the Hub
Commits & reviews
- Conventional commits, one concern per commit. Observed history:
feat:,fix:,refactor:,chore:,docs:. - Gate before committing:
uv run pytestgreen,uvx ruff check .clean, and the README updated if behavior changed. - Keep the engine package UI-agnostic; if you touch a runtime model path, re-check gotchas 2β4 (Off the Grid, param budget, OpenMP isolation).
Key environment variables
| Variable | Default | Use |
|---|---|---|
ADVISOR_CACHE_DIR |
β | Artifact store (mounted bucket on Spaces); enables the refresh scheduler when set |
ADVISOR_MODEL_BACKEND |
minicpm-transformers |
Advisor planner: minicpm-transformers or rules |
ADVISOR_MODEL_ID / ADVISOR_ADAPTER_ID / ADVISOR_ADAPTER_REVISION |
MiniCPM5-1B + advisor LoRA | Advisor model + pinned LoRA |
ADVISOR_QUEST_ANALYZER_BACKEND / ADVISOR_QUEST_ADAPTER_ID |
minicpm-transformers / build-small-hackathon/hackathon-advisor-quest-minicpm5-lora |
Quest classifier |
ADVISOR_ZERO_GPU / ADVISOR_ZERO_GPU_DURATION |
off / 120 |
Wrap the engine turn in @spaces.GPU on the deployed Space |
ADVISOR_ASR_MODEL_ID |
Nemotron | Voice ASR model |
ADVISOR_EMBEDDING_MODEL_REPO / ADVISOR_EMBEDDING_MODEL_FILE |
EmbeddingGemma GGUF | llama.cpp retrieval model |
ADVISOR_REFRESH_COMPUTE / ADVISOR_REFRESH_INTERVAL_SECONDS |
cpu / 3600 |
Scheduled refresh compute + cadence |
See ## Runtime Backend in README.md for the full deployed configuration.