File size: 4,548 Bytes
4791c0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13fe947
 
 
 
 
 
 
 
 
 
 
4791c0a
 
 
 
 
 
 
 
 
 
13fe947
4791c0a
 
13fe947
 
 
 
 
 
 
4791c0a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Quest-classification LoRA

The dashboard refresh asks MiniCPM5-1B to classify every crawled hackathon project
against the Build Small Hackathon judging dimensions and to quote short evidence for
each match. A pure prompt drifts (truncated JSON, renamed quests, runaway evidence),
so we fine-tune a small LoRA that fixes the task to a strict JSON contract. The
backend still validates every refresh and refuses to swap the dashboard on a schema
failure; the checked-in adapter is the product default, and validation is the
correctness gate.

## Label space (`hackathon_advisor/quest_taxonomy.py`)

13 dimensions, each detectable from a README and an app file:

- Six merit badges: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is
  Caring, Field Notes.
- Two main tracks: Backyard AI, Thousand Token Wood.
- Sponsor / special awards: OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent.

Output schema (one JSON object, nothing else):

```json
{"matches": [{"quest": "...", "confidence": 0.0, "evidence": "...", "source": "readme|app_file"}]}
```

`render_quest_prompt` is the single prompt renderer shared by the dataset and the
live analyzer, so the model sees the same two-segment shape (README + APP_FILE) at
train and inference time, with the same `QUEST_SYSTEM_PROMPT`.

## Dataset pipeline

1. `scripts/build_quest_corpus.py` — download the real README.md and main app-file
   source for all 125 crawled projects into `data/quest_corpus.json`.
2. Selection (`data/quest_selection.json`) — drop near-identical template clones
   (embedding cosine + identical app hash) and the shortest, signal-free tail;
   108 content-rich projects survive (app-only / readme-only / both profiles kept).
3. Teacher labelling — a multi-agent workflow labels each project (one agent) then
   adversarially verifies and corrects it (a second agent): drops matches whose
   evidence is not in the cited segment, fixes `source`, kills Off-the-Grid on a
   cloud-API app, kills Tiny Titan on >4B models. Output: `data/quest_labels/labeled.json`.
4. `scripts/build_quest_sft.py` — one natural example per project plus targeted
   augmentations so every case is represented: app-only, readme-only / missing app
   file, README↔app contradictions, empty matches, noisy metadata, app-only variants
   of the real remote-inference projects, and hand-authored contrastive **hard
   negatives** (a remote inference call — `InferenceClient`, HF Inference Endpoints,
   replicate, `*.modal.run` — must not earn Off the Grid; OpenBMB belongs only to
   `openbmb`/MiniCPM models; Tiny Titan only to ≤4B). `_check_invariants` fails the
   build on either crisp violation. Writes `data/quest_sft.jsonl`.

185 chat-JSONL examples (108 natural + 77 augmented), 27 with empty matches, all 13
quests covered. The contrastive negatives are up-weighted in training so they outweigh
the strong Off-the-Grid prior that, untreated, mislabels remote-API chatbots as local.

Published as a Hub dataset:
[`build-small-hackathon/hackathon-advisor-quest-dataset`](https://huggingface.co/datasets/build-small-hackathon/hackathon-advisor-quest-dataset)
(`scripts/publish_quest_dataset.py`). The trained adapter lives at
[`build-small-hackathon/hackathon-advisor-quest-minicpm5-lora`](https://huggingface.co/build-small-hackathon/hackathon-advisor-quest-minicpm5-lora).

## Training (`scripts/modal_train_quest_lora.py`)

```bash
modal run scripts/modal_train_quest_lora.py::smoke              # check the GPU
modal run scripts/modal_train_quest_lora.py --dataset data/quest_sft.jsonl --epochs 16
```

LoRA SFT on an **L40S**: rank 64, alpha 128, dropout 0, completion-only loss (the
prompt is masked to -100 so only the strict JSON is supervised), `max_seq_length=3072`,
chat template with `enable_thinking=False` to match inference. The dataset is the spec,
so the container **evaluates on the whole dataset** — quest-set exact match, micro
P/R/F1, and a mismatch list — and returns the adapter as a zip unpacked under
`artifacts/quest-lora/`. The shipped adapter scores quest-set exact match 185/185
(F1 1.0): every dataset project, including the remote-inference ones, is judged correctly.

## Serving

`MiniCPMQuestAnalyzer` loads the checked-in `artifacts/quest-lora` adapter by
default. `ADVISOR_QUEST_ADAPTER_ID` and `ADVISOR_QUEST_ADAPTER_REVISION` may point
to a replacement adapter, while an explicit empty adapter id runs the base model
for controlled experiments. `validate_matches_by_project` enforces the schema
before the dashboard is swapped.