--- license: apache-2.0 task_categories: - text-classification - text-generation language: - en tags: - hackathon-advisor - quest-classification - lora-sft - minicpm5 pretty_name: Hackathon Advisor Quest Classification SFT size_categories: - n<1K --- # Hackathon Advisor — Quest Classification SFT Dataset Supervised fine-tuning data that teaches MiniCPM5-1B to classify a Build Small Hackathon project against 13 judging dimensions from a two-segment README + app-file prompt, emitting strict JSON with short, source-attributed evidence. Trains the LoRA at [`build-small-hackathon/hackathon-advisor-quest-minicpm5-lora`](https://huggingface.co/build-small-hackathon/hackathon-advisor-quest-minicpm5-lora). ## Format (`quest_sft.jsonl`) Chat-JSONL. The **first line** is a `lora_sft_manifest`; every following line is a `lora_sft_example` with a `messages` list (system / user / assistant). The assistant turn is exactly one JSON object: ```json {"matches":[{"quest":"...","confidence":0.0,"evidence":"...","source":"readme|app_file"}]} ``` No markdown, no prose, no renamed quests; an empty `matches` list when no dimension has clear evidence. The user turn splits the project into a `[README]` segment and an `[APP_FILE]` segment so the model judges product description and implementation evidence separately and attributes each match to its source. ## Quest dimensions (13) Six merit badges (Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, Field Notes), two tracks (Backyard AI, Thousand Token Wood), and five sponsor / special awards (OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent). ## Examples: 156 (14 with empty matches) | variant | count | | --- | --- | | natural | 108 | | app_only | 16 | | missing_app_file | 16 | | noisy_metadata | 8 | | contradiction | 6 | | empty | 2 | Positive examples per quest: | quest | examples | | --- | --- | | Off the Grid | 87 | | Off-Brand | 59 | | Tiny Titan | 58 | | Thousand Token Wood | 49 | | Llama Champion | 35 | | Backyard AI | 35 | | Well-Tuned | 31 | | OpenBMB | 26 | | Sharing is Caring | 19 | | Nemotron | 18 | | Field Notes | 15 | | Modal | 14 | | Best Agent | 14 | ## Provenance Built from the real public Spaces of the `build-small-hackathon` org: 125 crawled projects → deduped + length-filtered to 108 content-rich ones → labelled by a teacher-then-adversarial-verifier multi-agent workflow → plus targeted augmentations (app-only, readme-only / missing app file, README↔app contradictions, empty matches, noisy metadata). `labeled.json` holds the per-project verified labels. Examples are derived from public hackathon submissions for research and hackathon use; each project remains under its own Space license.