Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
license: apache-2.0
task_categories:
- text-classification
- text-generation
language:
- en
tags:
- hackathon-advisor
- quest-classification
- lora-sft
- minicpm5
pretty_name: Hackathon Advisor Quest Classification SFT
size_categories:
- n<1K
Hackathon Advisor — Quest Classification SFT Dataset
Supervised fine-tuning data that teaches MiniCPM5-1B to classify a Build Small
Hackathon project against 13 judging dimensions from a two-segment README + app-file
prompt, emitting strict JSON with short, source-attributed evidence. Trains the LoRA at
build-small-hackathon/hackathon-advisor-quest-minicpm5-lora.
Format (quest_sft.jsonl)
Chat-JSONL. The first line is a lora_sft_manifest; every following line is a
lora_sft_example with a messages list (system / user / assistant). The assistant
turn is exactly one JSON object:
{"matches":[{"quest":"...","confidence":0.0,"evidence":"...","source":"readme|app_file"}]}
No markdown, no prose, no renamed quests; an empty matches list when no dimension has
clear evidence. The user turn splits the project into a [README] segment and an
[APP_FILE] segment so the model judges product description and implementation
evidence separately and attributes each match to its source.
Quest dimensions (13)
Six merit badges (Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, Field Notes), two tracks (Backyard AI, Thousand Token Wood), and five sponsor / special awards (OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent).
Examples: 156 (14 with empty matches)
| variant | count |
|---|---|
| natural | 108 |
| app_only | 16 |
| missing_app_file | 16 |
| noisy_metadata | 8 |
| contradiction | 6 |
| empty | 2 |
Positive examples per quest:
| quest | examples |
|---|---|
| Off the Grid | 87 |
| Off-Brand | 59 |
| Tiny Titan | 58 |
| Thousand Token Wood | 49 |
| Llama Champion | 35 |
| Backyard AI | 35 |
| Well-Tuned | 31 |
| OpenBMB | 26 |
| Sharing is Caring | 19 |
| Nemotron | 18 |
| Field Notes | 15 |
| Modal | 14 |
| Best Agent | 14 |
Provenance
Built from the real public Spaces of the build-small-hackathon org: 125 crawled
projects → deduped + length-filtered to 108 content-rich ones → labelled by a
teacher-then-adversarial-verifier multi-agent workflow → plus targeted augmentations
(app-only, readme-only / missing app file, README↔app contradictions, empty matches,
noisy metadata). labeled.json holds the per-project verified labels. Examples are
derived from public hackathon submissions for research and hackathon use; each project
remains under its own Space license.