hackathon-advisor / AGENTS.md
JacobLinCool's picture
deploy: sync GitHub main de5dbf9
13fe947 verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

AGENTS.md

Operating manual for coding agents working in this repo.


What this is

Hackathon Advisor is a Gradio gradio.Server (FastAPI subclass) Space for the Build Small Hackathon. It is a small-model (≀32B, largest single model ≀4B) originality coach: it crawls the public build-small-hackathon org into a live project atlas, then lets a builder search the field and open The Unwritten Almanac advisor to test an idea against existing work.

The engine in hackathon_advisor/ is UI-agnostic; app.py and static/ are one possible front door.

Model stack (all open-weight, all local):

Role Model Runtime
Advisor brain (tool planning) openbmb/MiniCPM5-1B + advisor LoRA Transformers + PEFT, ZeroGPU
Quest classifier openbmb/MiniCPM5-1B + quest LoRA Transformers + PEFT, ZeroGPU
Retrieval / atlas ggml-org/embeddinggemma-300m-qat-q8_0-GGUF llama.cpp (llama-cpp-python)
Voice input (ASR) nvidia/nemotron-speech-streaming-en-0.6b NVIDIA NeMo

Setup & commands

  • Python >=3.11,<3.13. Dependency manager is uv (uv.lock is the source of truth).
  • System packages (packages.txt): ffmpeg, libsndfile1.
uv sync                       # or: pip install -r requirements.txt
uv run pytest                 # run the test suite (fast, NO GPU/weights needed β€” heavy models are mocked)
uvx ruff check .              # lint   (config: pyproject.toml [tool.ruff], line-length 100, py311; ruff is not a pinned dep)
uvx ruff format .             # format

Run the app locally (greedy CPU/MPS path, no ZeroGPU):

mkdir -p .cache/advisor-dashboard
ADVISOR_CACHE_DIR=.cache/advisor-dashboard \
ADVISOR_MODEL_BACKEND=minicpm-transformers \
ADVISOR_QUEST_ANALYZER_BACKEND=minicpm-transformers \
python app.py                 # β†’ http://127.0.0.1:7860

ADVISOR_MODEL_BACKEND=rules swaps the LLM for a deterministic planner β€” use it for UI/plumbing work without loading MiniCPM.

pytest config lives in pyproject.toml (testpaths=["tests"], pythonpath=["."]). Always run it before committing β€” there are 26 test files and they are the contract.


Repo map

app.py                  gr.Server entry: static UI + FastAPI /api/* + @app.api() client endpoints + refresh scheduler
hackathon_advisor/      the engine package (UI-agnostic β€” keep it that way)
static/                 bespoke frontend (index.html / app.js / styles.css) β€” the Off-Brand custom UI
scripts/                offline pipelines (crawl, Modal index/LoRA build, Hub publish) β€” NOT runtime
data/                   checked-in snapshots: projects.json, project_index.json, sample_trace.jsonl, quest dataset
artifacts/quest-lora/   local quest-LoRA training output (gitignored; loaded from the Hub repo at runtime)
docs/                   build reports (e.g. quest-classification-lora.md)
tests/                  pytest suite (mirrors module names: test_<module>.py)

Engine package (hackathon_advisor/)

Module Responsibility
agent.py AdvisorEngine.turn() / turn_stream(). One LLM tool-pick per turn, then deterministic Python orchestration (search β†’ whitespace β†’ score β†’ plan). Advisor prose is built from f-string templates here, not by the model.
model_runtime.py ToolPlanner backends. create_tool_planner() selects via ADVISOR_MODEL_BACKEND: minicpm-transformers (MiniCPM5-1B + advisor LoRA, device ladder auto/CUDA β†’ MPS β†’ CPU) or rules (RuleBasedPlanner).
tool_contracts.py TOOL_SPECS typed schema; parse_xml_tool_call(); resolve_tool_call() returns valid or a defaulted call (the tool-call degradation ladder).
tools.py Tool implementations over ProjectIndex (search, whitespace, score, plan, profile, …). Heavy logic lives here, not in the model.
aliases.py Jargon normalization (fuzzy-maps "neutron" β†’ Nemotron, "mini cpm" β†’ MiniCPM5, …) applied before tool routing.
data.py ProjectIndex: loads the snapshot + embedding index, _embed_query() via llama.cpp, cosine search.
llama_embedding.py LlamaCppEmbedder β€” EmbeddingGemma GGUF through llama-cpp-python (the Llama Champion path).
dashboard.py / dashboard_storage.py / dashboard_search.py Atlas payload (t-SNE / KMeans / nearest links), BM25 search, and the refresh lease + heartbeat + atomic latest.json swap.
quest_analysis.py / quest_taxonomy.py / quest_cache.py MiniCPM quest LoRA β†’ strict quest JSON; the taxonomy; per-project cache keyed on prompt/taxonomy/model/adapter hashes.
scoring.py Deterministic idea rubric (the model only triggers + verbalizes it).
wood_map.py / png_export.py PCA projection + Pillow render of the shareable page PNG.
field_notes.py / chapter.py / trace_export.py / submission_packet.py / artifact_bundle.py / demo_rehearsal.py Export surfaces (notes, chapter, agent trace, submission packet, demo bundle).
prize_ledger.py Model stack + parameter budget + badge ledger reported at /api/prize-ledger.
zerogpu.py gpu_task() decorator (no-op unless ADVISOR_ZERO_GPU=1) + GPU-quota error detection for the CPU fallback.
runtime_hooks.py / profiling.py Process/runtime helpers and turn profiling.

Routes (app.py)

First-party FastAPI routes power the visible app; @app.api() endpoints stay available for Gradio/Python clients.

Route Purpose
GET / , GET /static/{path} Serve the bespoke static/ frontend
POST /api/agent-turn The advisor turn β€” NDJSON stream; this is the @spaces.GPU boundary
POST /api/transcribe Voice note β†’ transcript (NeMo, see ASR gotcha)
GET /api/dashboard Β· GET /api/dashboard/search Atlas payload Β· BM25 search
POST/GET /api/dashboard/refresh Start / poll one background refresh job
GET /api/bootstrap Β· GET /api/runtime Β· GET /api/prize-ledger Β· GET /api/tool-contracts Frontend bootstrap, runtime status, prize ledger, tool schema
GET /api/demo-bundle.zip Β· GET /api/lora-training-kit.zip Β· POST /api/artifact.png Β· POST /api/field-notes Β· POST /api/chapter Exports
GET /health Liveness

Gotchas (the things that bite agents here)

  1. The 1B model only emits ONE XML tool call per turn. All user-facing prose is templated Python (agent.py _*_response), and multi-step flows are orchestrated in code β€” not a model-driven ReAct loop. Do not "make the model write the response" or add multi-hop tool loops; route through tool_contracts.py instead.
  2. Off the Grid is a hard constraint. No proprietary cloud inference API may touch the runtime path. All three engines run locally from open weights. Don't add InferenceClient, openai, etc. to runtime code.
  3. Parameter budget. Total ≀32B, largest single model ≀4B (Tiny Titan). Don't introduce a larger model; prize_ledger.py documents the ~1.98B stack.
  4. MiniCPM (PyTorch) and llama.cpp clash on OpenMP. Query embedding runs in a worker subprocess on macOS, and dashboard refresh builds the GGUF index in a subprocess before returning to the MiniCPM process. Keep these isolated; don't import both heavy runtimes into the same hot path.
  5. Decoding is greedy. enable_thinking=False, temperature=0 for tool calls and strict quest JSON. Keep tool schemas small and single-hop (1B discipline).
  6. Never write latest.json directly. Refreshes write runs/{run_id}/… then do an atomic swap under $ADVISOR_CACHE_DIR/refresh.lock with a heartbeat; a failed run leaves the last validated dashboard in place.
  7. Tests must stay GPU-free. The suite mocks torch/transformers/llama.cpp β€” pytest runs with no GPU and no model weights. Don't add module-top heavy imports that break CPU-only test collection.
  8. ASR backend. asr_runtime.py requires NVIDIA NeMo ASR for nvidia/nemotron-speech-streaming-en-0.6b; missing NeMo is a hard runtime error, locally and on the deployed Space. status() reports the configured Nemotron backend.

Offline pipelines (scripts/, build-time only)

Runtime never calls these β€” they keep the Space self-contained.

python scripts/crawl_hf_spaces.py --org build-small-hackathon --out data/projects.json   # crawl the field
python scripts/build_project_index.py --projects data/projects.json --out data/project_index.json   # local llama.cpp index
python scripts/build_project_index.py --location modal ...   # same build, on Modal (one CLI, --location switches where it runs)
modal run scripts/modal_train_quest_lora.py ...           # train the quest LoRA on Modal
python scripts/publish_quest_adapter.py ... / publish_quest_dataset.py ...   # push adapter / dataset to the Hub

Commits & reviews

  • Conventional commits, one concern per commit. Observed history: feat:, fix:, refactor:, chore:, docs:.
  • Gate before committing: uv run pytest green, uvx ruff check . clean, and the README updated if behavior changed.
  • Keep the engine package UI-agnostic; if you touch a runtime model path, re-check gotchas 2–4 (Off the Grid, param budget, OpenMP isolation).

Key environment variables

Variable Default Use
ADVISOR_CACHE_DIR β€” Artifact store (mounted bucket on Spaces); enables the refresh scheduler when set
ADVISOR_MODEL_BACKEND minicpm-transformers Advisor planner: minicpm-transformers or rules
ADVISOR_MODEL_ID / ADVISOR_ADAPTER_ID / ADVISOR_ADAPTER_REVISION MiniCPM5-1B + advisor LoRA Advisor model + pinned LoRA
ADVISOR_QUEST_ANALYZER_BACKEND / ADVISOR_QUEST_ADAPTER_ID minicpm-transformers / build-small-hackathon/hackathon-advisor-quest-minicpm5-lora Quest classifier
ADVISOR_ZERO_GPU / ADVISOR_ZERO_GPU_DURATION off / 120 Wrap the engine turn in @spaces.GPU on the deployed Space
ADVISOR_ASR_MODEL_ID Nemotron Voice ASR model
ADVISOR_EMBEDDING_MODEL_REPO / ADVISOR_EMBEDDING_MODEL_FILE EmbeddingGemma GGUF llama.cpp retrieval model
ADVISOR_REFRESH_COMPUTE / ADVISOR_REFRESH_INTERVAL_SECONDS cpu / 3600 Scheduled refresh compute + cadence

See ## Runtime Backend in README.md for the full deployed configuration.