Spaces:

build-small-hackathon
/

case-forge

Running on Zero

App Files Files Community

case-forge / README.md

nextmarte

App back to ZeroGPU (merged in-Space); Modal offline-only

59a4de2 verified 19 days ago

preview code

Raw

History Blame Contribute Delete

4.69 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Case Forge
emoji: 📓
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.17.3
app_file: app.py
pinned: true
license: apache-2.0
short_description: Harvard-style teaching cases + notes, forged from a topic.
tags:
  - backyard-ai
  - tiny-titan
  - well-tuned
  - off-grid
  - best-agent

📓 Case Forge — teaching cases on demand

Built for the Build Small Hackathon · Backyard AI track · for me — an MBA instructor — and fellow professors who author their own teaching cases.

It's an authoring tool: generate a draft, then edit the Markdown live, regenerate, and export. The real-user test is an instructor forging a case they'll actually teach.

Demo video: ‹add link› · Social post: ‹add link› · Model: ‹add HF model link›

Case Forge turns a one-line request — domain + topic + level + language — into a complete Harvard-style teaching case and its teaching note, structured and ready to take into a classroom. The case opens on a protagonist facing a real dilemma, lays out sourced data and the paths to weigh, and stops at the decision point (the outcome lives in the note's epilogue, where the instructor wants it).

The hero isn't generic prose — it's structure that holds: the dilemma never leaks its answer, learning objectives stay ≤4 and measurable, and discussion questions line up with the objectives. The app shows these as live quality badges.

How it works

A short request expands into the full output contract because the format is in the weights, not the prompt. We fine-tuned a small student model on a synthetic corpus of cases generated by a large teacher — so the ≤4B model reliably emits the whole schema from a minimal ask.

Student (in the app): Qwen3-4B-Instruct-2507 + a published Case Forge LoRA adapter.
Teacher (offline only): a large dense model on Modal that generated the synthetic training corpus. It never ships in the app and is not counted below.
Runtime: the student runs in-Space on ZeroGPU (free) — base Qwen3-4B + the published LoRA, merged at the first GPU call, generating via transformers. Modal is used only for the offline pipeline (corpus generation, fine-tune, LoRA merge), which is pay-per-use; keeping a GPU warm for serving was not cost-viable.
Quality: an Opus-4.8 content audit found the first model fabricated sources and made numeric errors; the corpus was regenerated with a numeric-auditor pass and the model retrained (v3) — fabricated sources and severe math errors went to 0/6 on a number-heavy re-audit. Figures are illustrative; the UI says "verify before class".

Parameter budget (hackathon: ≤32B total)

Model	Where	Params
Qwen3-4B-Instruct-2507 + LoRA	in the app (ZeroGPU)	4.0B
Total loaded by the Space		4.0B

The teacher model used to create the training data runs offline on Modal and is not loaded by the app, so it does not count toward the budget. 4.0B ≤ 4B keeps the Tiny Titan badge.

Pipeline (reproducible)

Seeds — structural skeletons from open Brazilian cases (ENAP Casoteca via OAI-PMH) + hand-authored ones. Real case text is training input only, never exposed or published.
Synthetic generation — the teacher writes ~610 original case+note pairs from the seeds (99.7% schema-valid) on Modal.
Fine-tune — LoRA on the ≤4B student (Modal, H100). The student learns to map a short request → the full JSON contract.
Eval — short-prompt schema validity: base 0% → fine-tuned 100% (20 held-out seeds), 100% with sourced data.

The output contract lives in data/schema.py and is enforced both when generating the corpus and when validating in the app.

Awards in play

Backyard AI (a real instructor authoring cases he'll teach) · Well-Tuned (published fine-tune) · Tiny Titan (≤4B) · Best Agent (multi-stage teacher→audit→student pipeline) · Modal (corpus generation, fine-tune and LoRA merge run on Modal credits) · Off-Grid (app serves its own weights in-Space on ZeroGPU — no third-party model API).

Run locally

pip install -r requirements.txt
python app.py            # http://localhost:7860
# No GPU? The UI runs on a real sample case from the corpus:
CASE_FORGE_DEMO=1 python app.py

Config (env): CASE_FORGE_ADAPTER (HF repo id of the LoRA), CASE_FORGE_BASE (base model), CASE_FORGE_DEMO=1 (force the sample), CASE_FORGE_MAX_TOKENS.