hackathon-advisor / docs /quest-classification-lora.md
JacobLinCool's picture
deploy: sync GitHub main de5dbf9
13fe947 verified
# Quest-classification LoRA
The dashboard refresh asks MiniCPM5-1B to classify every crawled hackathon project
against the Build Small Hackathon judging dimensions and to quote short evidence for
each match. A pure prompt drifts (truncated JSON, renamed quests, runaway evidence),
so we fine-tune a small LoRA that fixes the task to a strict JSON contract. The
backend still validates every refresh and refuses to swap the dashboard on a schema
failure; the checked-in adapter is the product default, and validation is the
correctness gate.
## Label space (`hackathon_advisor/quest_taxonomy.py`)
13 dimensions, each detectable from a README and an app file:
- Six merit badges: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is
Caring, Field Notes.
- Two main tracks: Backyard AI, Thousand Token Wood.
- Sponsor / special awards: OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent.
Output schema (one JSON object, nothing else):
```json
{"matches": [{"quest": "...", "confidence": 0.0, "evidence": "...", "source": "readme|app_file"}]}
```
`render_quest_prompt` is the single prompt renderer shared by the dataset and the
live analyzer, so the model sees the same two-segment shape (README + APP_FILE) at
train and inference time, with the same `QUEST_SYSTEM_PROMPT`.
## Dataset pipeline
1. `scripts/build_quest_corpus.py` — download the real README.md and main app-file
source for all 125 crawled projects into `data/quest_corpus.json`.
2. Selection (`data/quest_selection.json`) — drop near-identical template clones
(embedding cosine + identical app hash) and the shortest, signal-free tail;
108 content-rich projects survive (app-only / readme-only / both profiles kept).
3. Teacher labelling — a multi-agent workflow labels each project (one agent) then
adversarially verifies and corrects it (a second agent): drops matches whose
evidence is not in the cited segment, fixes `source`, kills Off-the-Grid on a
cloud-API app, kills Tiny Titan on >4B models. Output: `data/quest_labels/labeled.json`.
4. `scripts/build_quest_sft.py` — one natural example per project plus targeted
augmentations so every case is represented: app-only, readme-only / missing app
file, README↔app contradictions, empty matches, noisy metadata, app-only variants
of the real remote-inference projects, and hand-authored contrastive **hard
negatives** (a remote inference call — `InferenceClient`, HF Inference Endpoints,
replicate, `*.modal.run` — must not earn Off the Grid; OpenBMB belongs only to
`openbmb`/MiniCPM models; Tiny Titan only to ≤4B). `_check_invariants` fails the
build on either crisp violation. Writes `data/quest_sft.jsonl`.
185 chat-JSONL examples (108 natural + 77 augmented), 27 with empty matches, all 13
quests covered. The contrastive negatives are up-weighted in training so they outweigh
the strong Off-the-Grid prior that, untreated, mislabels remote-API chatbots as local.
Published as a Hub dataset:
[`build-small-hackathon/hackathon-advisor-quest-dataset`](https://huggingface.co/datasets/build-small-hackathon/hackathon-advisor-quest-dataset)
(`scripts/publish_quest_dataset.py`). The trained adapter lives at
[`build-small-hackathon/hackathon-advisor-quest-minicpm5-lora`](https://huggingface.co/build-small-hackathon/hackathon-advisor-quest-minicpm5-lora).
## Training (`scripts/modal_train_quest_lora.py`)
```bash
modal run scripts/modal_train_quest_lora.py::smoke # check the GPU
modal run scripts/modal_train_quest_lora.py --dataset data/quest_sft.jsonl --epochs 16
```
LoRA SFT on an **L40S**: rank 64, alpha 128, dropout 0, completion-only loss (the
prompt is masked to -100 so only the strict JSON is supervised), `max_seq_length=3072`,
chat template with `enable_thinking=False` to match inference. The dataset is the spec,
so the container **evaluates on the whole dataset** — quest-set exact match, micro
P/R/F1, and a mismatch list — and returns the adapter as a zip unpacked under
`artifacts/quest-lora/`. The shipped adapter scores quest-set exact match 185/185
(F1 1.0): every dataset project, including the remote-inference ones, is judged correctly.
## Serving
`MiniCPMQuestAnalyzer` loads the checked-in `artifacts/quest-lora` adapter by
default. `ADVISOR_QUEST_ADAPTER_ID` and `ADVISOR_QUEST_ADAPTER_REVISION` may point
to a replacement adapter, while an explicit empty adapter id runs the base model
for controlled experiments. `validate_matches_by_project` enforces the schema
before the dashboard is swapped.