Spaces:

build-small-hackathon
/

hackathon-advisor

Running on Zero

App Files Files Community

hackathon-advisor / data /quest_dataset_card.md

JacobLinCool

feat: add live project atlas

4791c0a verified 1 day ago

preview code

raw

history blame contribute delete

2.72 kB

	---
	license: apache-2.0
	task_categories:
	- text-classification
	- text-generation
	language:
	- en
	tags:
	- hackathon-advisor
	- quest-classification
	- lora-sft
	- minicpm5
	pretty_name: Hackathon Advisor Quest Classification SFT
	size_categories:
	- n<1K
	---

	# Hackathon Advisor — Quest Classification SFT Dataset

	Supervised fine-tuning data that teaches MiniCPM5-1B to classify a Build Small
	Hackathon project against 13 judging dimensions from a two-segment README + app-file
	prompt, emitting strict JSON with short, source-attributed evidence. Trains the LoRA at
	[`build-small-hackathon/hackathon-advisor-quest-minicpm5-lora`](https://huggingface.co/build-small-hackathon/hackathon-advisor-quest-minicpm5-lora).

	## Format (`quest_sft.jsonl`)

	Chat-JSONL. The first line is a `lora_sft_manifest`; every following line is a
	`lora_sft_example` with a `messages` list (system / user / assistant). The assistant
	turn is exactly one JSON object:

	```json
	{"matches":[{"quest":"...","confidence":0.0,"evidence":"...","source":"readme\|app_file"}]}
	```

	No markdown, no prose, no renamed quests; an empty `matches` list when no dimension has
	clear evidence. The user turn splits the project into a `[README]` segment and an
	`[APP_FILE]` segment so the model judges product description and implementation
	evidence separately and attributes each match to its source.

	## Quest dimensions (13)

	Six merit badges (Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is
	Caring, Field Notes), two tracks (Backyard AI, Thousand Token Wood), and five
	sponsor / special awards (OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent).

	## Examples: 156 (14 with empty matches)

	\| variant \| count \|
	\| --- \| --- \|
	\| natural \| 108 \|
	\| app_only \| 16 \|
	\| missing_app_file \| 16 \|
	\| noisy_metadata \| 8 \|
	\| contradiction \| 6 \|
	\| empty \| 2 \|

	Positive examples per quest:

	\| quest \| examples \|
	\| --- \| --- \|
	\| Off the Grid \| 87 \|
	\| Off-Brand \| 59 \|
	\| Tiny Titan \| 58 \|
	\| Thousand Token Wood \| 49 \|
	\| Llama Champion \| 35 \|
	\| Backyard AI \| 35 \|
	\| Well-Tuned \| 31 \|
	\| OpenBMB \| 26 \|
	\| Sharing is Caring \| 19 \|
	\| Nemotron \| 18 \|
	\| Field Notes \| 15 \|
	\| Modal \| 14 \|
	\| Best Agent \| 14 \|

	## Provenance

	Built from the real public Spaces of the `build-small-hackathon` org: 125 crawled
	projects → deduped + length-filtered to 108 content-rich ones → labelled by a
	teacher-then-adversarial-verifier multi-agent workflow → plus targeted augmentations
	(app-only, readme-only / missing app file, README↔app contradictions, empty matches,
	noisy metadata). `labeled.json` holds the per-project verified labels. Examples are
	derived from public hackathon submissions for research and hackathon use; each project
	remains under its own Space license.