Spaces:

build-small-hackathon
/

hackathon-advisor

Running on Zero

App Files Files Community

hackathon-advisor / docs /quest-classification-lora.md

JacobLinCool

deploy: sync GitHub main de5dbf9

13fe947 verified about 17 hours ago

preview code

raw

history blame contribute delete

4.55 kB

	# Quest-classification LoRA

	The dashboard refresh asks MiniCPM5-1B to classify every crawled hackathon project
	against the Build Small Hackathon judging dimensions and to quote short evidence for
	each match. A pure prompt drifts (truncated JSON, renamed quests, runaway evidence),
	so we fine-tune a small LoRA that fixes the task to a strict JSON contract. The
	backend still validates every refresh and refuses to swap the dashboard on a schema
	failure; the checked-in adapter is the product default, and validation is the
	correctness gate.

	## Label space (`hackathon_advisor/quest_taxonomy.py`)

	13 dimensions, each detectable from a README and an app file:

	- Six merit badges: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is
	Caring, Field Notes.
	- Two main tracks: Backyard AI, Thousand Token Wood.
	- Sponsor / special awards: OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent.

	Output schema (one JSON object, nothing else):

	```json
	{"matches": [{"quest": "...", "confidence": 0.0, "evidence": "...", "source": "readme\|app_file"}]}
	```

	`render_quest_prompt` is the single prompt renderer shared by the dataset and the
	live analyzer, so the model sees the same two-segment shape (README + APP_FILE) at
	train and inference time, with the same `QUEST_SYSTEM_PROMPT`.

	## Dataset pipeline

	1. `scripts/build_quest_corpus.py` — download the real README.md and main app-file
	source for all 125 crawled projects into `data/quest_corpus.json`.
	2. Selection (`data/quest_selection.json`) — drop near-identical template clones
	(embedding cosine + identical app hash) and the shortest, signal-free tail;
	108 content-rich projects survive (app-only / readme-only / both profiles kept).
	3. Teacher labelling — a multi-agent workflow labels each project (one agent) then
	adversarially verifies and corrects it (a second agent): drops matches whose
	evidence is not in the cited segment, fixes `source`, kills Off-the-Grid on a
	cloud-API app, kills Tiny Titan on >4B models. Output: `data/quest_labels/labeled.json`.
	4. `scripts/build_quest_sft.py` — one natural example per project plus targeted
	augmentations so every case is represented: app-only, readme-only / missing app
	file, README↔app contradictions, empty matches, noisy metadata, app-only variants
	of the real remote-inference projects, and hand-authored contrastive **hard
	negatives** (a remote inference call — `InferenceClient`, HF Inference Endpoints,
	replicate, `*.modal.run` — must not earn Off the Grid; OpenBMB belongs only to
	`openbmb`/MiniCPM models; Tiny Titan only to ≤4B). `_check_invariants` fails the
	build on either crisp violation. Writes `data/quest_sft.jsonl`.

	185 chat-JSONL examples (108 natural + 77 augmented), 27 with empty matches, all 13
	quests covered. The contrastive negatives are up-weighted in training so they outweigh
	the strong Off-the-Grid prior that, untreated, mislabels remote-API chatbots as local.

	Published as a Hub dataset:
	[`build-small-hackathon/hackathon-advisor-quest-dataset`](https://huggingface.co/datasets/build-small-hackathon/hackathon-advisor-quest-dataset)
	(`scripts/publish_quest_dataset.py`). The trained adapter lives at
	[`build-small-hackathon/hackathon-advisor-quest-minicpm5-lora`](https://huggingface.co/build-small-hackathon/hackathon-advisor-quest-minicpm5-lora).

	## Training (`scripts/modal_train_quest_lora.py`)

	```bash
	modal run scripts/modal_train_quest_lora.py::smoke # check the GPU
	modal run scripts/modal_train_quest_lora.py --dataset data/quest_sft.jsonl --epochs 16
	```

	LoRA SFT on an L40S: rank 64, alpha 128, dropout 0, completion-only loss (the
	prompt is masked to -100 so only the strict JSON is supervised), `max_seq_length=3072`,
	chat template with `enable_thinking=False` to match inference. The dataset is the spec,
	so the container evaluates on the whole dataset — quest-set exact match, micro
	P/R/F1, and a mismatch list — and returns the adapter as a zip unpacked under
	`artifacts/quest-lora/`. The shipped adapter scores quest-set exact match 185/185
	(F1 1.0): every dataset project, including the remote-inference ones, is judged correctly.

	## Serving

	`MiniCPMQuestAnalyzer` loads the checked-in `artifacts/quest-lora` adapter by
	default. `ADVISOR_QUEST_ADAPTER_ID` and `ADVISOR_QUEST_ADAPTER_REVISION` may point
	to a replacement adapter, while an explicit empty adapter id runs the base model
	for controlled experiments. `validate_matches_by_project` enforces the schema
	before the dashboard is swapped.