Spaces:

build-small-hackathon
/

hackathon-advisor

Running on Zero

App Files Files Community

hackathon-advisor / data /quest_dataset_card.md

JacobLinCool

feat: add live project atlas

4791c0a verified 1 day ago

preview code

raw

history blame contribute delete

2.72 kB

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

metadata

license: apache-2.0
task_categories:
  - text-classification
  - text-generation
language:
  - en
tags:
  - hackathon-advisor
  - quest-classification
  - lora-sft
  - minicpm5
pretty_name: Hackathon Advisor Quest Classification SFT
size_categories:
  - n<1K

Hackathon Advisor — Quest Classification SFT Dataset

Supervised fine-tuning data that teaches MiniCPM5-1B to classify a Build Small Hackathon project against 13 judging dimensions from a two-segment README + app-file prompt, emitting strict JSON with short, source-attributed evidence. Trains the LoRA at build-small-hackathon/hackathon-advisor-quest-minicpm5-lora.

Format (`quest_sft.jsonl`)

Chat-JSONL. The first line is a lora_sft_manifest; every following line is a lora_sft_example with a messages list (system / user / assistant). The assistant turn is exactly one JSON object:

{"matches":[{"quest":"...","confidence":0.0,"evidence":"...","source":"readme|app_file"}]}

No markdown, no prose, no renamed quests; an empty matches list when no dimension has clear evidence. The user turn splits the project into a [README] segment and an [APP_FILE] segment so the model judges product description and implementation evidence separately and attributes each match to its source.

Quest dimensions (13)

Six merit badges (Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, Field Notes), two tracks (Backyard AI, Thousand Token Wood), and five sponsor / special awards (OpenBMB, Nemotron, Modal, Tiny Titan, Best Agent).

Examples: 156 (14 with empty matches)

variant	count
natural	108
app_only	16
missing_app_file	16
noisy_metadata	8
contradiction	6
empty	2

Positive examples per quest:

quest	examples
Off the Grid	87
Off-Brand	59
Tiny Titan	58
Thousand Token Wood	49
Llama Champion	35
Backyard AI	35
Well-Tuned	31
OpenBMB	26
Sharing is Caring	19
Nemotron	18
Field Notes	15
Modal	14
Best Agent	14

Provenance

Built from the real public Spaces of the build-small-hackathon org: 125 crawled projects → deduped + length-filtered to 108 content-rich ones → labelled by a teacher-then-adversarial-verifier multi-agent workflow → plus targeted augmentations (app-only, readme-only / missing app file, README↔app contradictions, empty matches, noisy metadata). labeled.json holds the per-project verified labels. Examples are derived from public hackathon submissions for research and hackathon use; each project remains under its own Space license.

Hackathon Advisor — Quest Classification SFT Dataset

Format (quest_sft.jsonl)

Quest dimensions (13)

Examples: 156 (14 with empty matches)

Provenance

Format (`quest_sft.jsonl`)