Spaces:

build-small-hackathon
/

case-forge

Running on Zero

App Files Files Community

case-forge / README.md

nextmarte

App back to ZeroGPU (merged in-Space); Modal offline-only

59a4de2 verified 19 days ago

preview code

Raw

History Blame Contribute Delete

4.69 kB

	---
	title: Case Forge
	emoji: 📓
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 6.17.3
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: Harvard-style teaching cases + notes, forged from a topic.
	tags:
	- backyard-ai
	- tiny-titan
	- well-tuned
	- off-grid
	- best-agent
	---

	# 📓 Case Forge — teaching cases on demand

	> Built for the Build Small Hackathon · Backyard AI track · for me — an MBA instructor — and fellow professors who author their own teaching cases.

	It's an authoring tool: generate a draft, then edit the Markdown live, regenerate,
	and export. The real-user test is an instructor forging a case they'll actually teach.

	Demo video: _‹add link›_ · Social post: _‹add link›_ · Model: _‹add HF model link›_

	Case Forge turns a one-line request — domain + topic + level + language — into a
	complete Harvard-style teaching case and its teaching note, structured and
	ready to take into a classroom. The case opens on a protagonist facing a real
	dilemma, lays out sourced data and the paths to weigh, and **stops at the decision
	point** (the outcome lives in the note's epilogue, where the instructor wants it).

	The hero isn't generic prose — it's structure that holds: the dilemma never
	leaks its answer, learning objectives stay ≤4 and measurable, and discussion
	questions line up with the objectives. The app shows these as live quality badges.

	## How it works

	A short request expands into the full output contract because **the format is in the
	weights, not the prompt**. We fine-tuned a small student model on a synthetic corpus
	of cases generated by a large teacher — so the ≤4B model reliably emits the whole
	schema from a minimal ask.

	- Student (in the app): [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) + a published Case Forge LoRA adapter.
	- Teacher (offline only): a large dense model on Modal that generated the
	synthetic training corpus. It never ships in the app and is not counted below.
	- Runtime: the student runs in-Space on ZeroGPU (free) — base Qwen3-4B + the
	published LoRA, merged at the first GPU call, generating via transformers. Modal
	is used only for the offline pipeline (corpus generation, fine-tune, LoRA merge),
	which is pay-per-use; keeping a GPU warm for serving was not cost-viable.
	- Quality: an Opus-4.8 content audit found the first model fabricated sources and
	made numeric errors; the corpus was regenerated with a numeric-auditor pass and the
	model retrained (v3) — fabricated sources and severe math errors went to 0/6
	on a number-heavy re-audit. Figures are illustrative; the UI says "verify before class".

	### Parameter budget (hackathon: ≤32B total)

	\| Model \| Where \| Params \|
	\|---\|---\|---\|
	\| Qwen3-4B-Instruct-2507 + LoRA \| in the app (ZeroGPU) \| 4.0B \|
	\| Total loaded by the Space \| \| 4.0B \|

	The teacher model used to create the training data runs offline on Modal and is
	not loaded by the app, so it does not count toward the budget. 4.0B ≤ 4B
	keeps the Tiny Titan badge.

	## Pipeline (reproducible)

	1. Seeds — structural skeletons from open Brazilian cases (ENAP Casoteca via
	OAI-PMH) + hand-authored ones. Real case text is training input only, never
	exposed or published.
	2. Synthetic generation — the teacher writes ~610 original case+note pairs from
	the seeds (99.7% schema-valid) on Modal.
	3. Fine-tune — LoRA on the ≤4B student (Modal, H100). The student learns to map
	a short request → the full JSON contract.
	4. Eval — short-prompt schema validity: base 0% → fine-tuned 100% (20 held-out
	seeds), 100% with sourced data.

	The output contract lives in [`data/schema.py`](data/schema.py) and is enforced both
	when generating the corpus and when validating in the app.

	## Awards in play

	Backyard AI (a real instructor authoring cases he'll teach) · Well-Tuned
	(published fine-tune) · Tiny Titan (≤4B) · Best Agent (multi-stage
	teacher→audit→student pipeline) · Modal (corpus generation, fine-tune and LoRA
	merge run on Modal credits) · Off-Grid (app serves its own weights in-Space on
	ZeroGPU — no third-party model API).

	## Run locally

	```bash
	pip install -r requirements.txt
	python app.py # http://localhost:7860
	# No GPU? The UI runs on a real sample case from the corpus:
	CASE_FORGE_DEMO=1 python app.py
	```

	Config (env): `CASE_FORGE_ADAPTER` (HF repo id of the LoRA), `CASE_FORGE_BASE`
	(base model), `CASE_FORGE_DEMO=1` (force the sample), `CASE_FORGE_MAX_TOKENS`.