--- title: Case Forge emoji: 📓 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.17.3 app_file: app.py pinned: true license: apache-2.0 short_description: Harvard-style teaching cases + notes, forged from a topic. tags: - backyard-ai - tiny-titan - well-tuned - off-grid - best-agent --- # 📓 Case Forge — teaching cases on demand > Built for the **Build Small Hackathon** · **Backyard AI** track · for me — an MBA instructor — and fellow professors who **author** their own teaching cases. It's an **authoring tool**: generate a draft, then edit the Markdown live, regenerate, and export. The real-user test is an instructor forging a case they'll actually teach. **Demo video:** _‹add link›_ · **Social post:** _‹add link›_ · **Model:** _‹add HF model link›_ Case Forge turns a one-line request — **domain + topic + level + language** — into a complete **Harvard-style teaching case** *and* its **teaching note**, structured and ready to take into a classroom. The case opens on a protagonist facing a real dilemma, lays out sourced data and the paths to weigh, and **stops at the decision point** (the outcome lives in the note's epilogue, where the instructor wants it). The hero isn't generic prose — it's **structure that holds**: the dilemma never leaks its answer, learning objectives stay ≤4 and measurable, and discussion questions line up with the objectives. The app shows these as live quality badges. ## How it works A short request expands into the full output contract because **the format is in the weights, not the prompt**. We fine-tuned a small student model on a synthetic corpus of cases generated by a large teacher — so the ≤4B model reliably emits the whole schema from a minimal ask. - **Student (in the app):** [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) + a published **Case Forge LoRA** adapter. - **Teacher (offline only):** a large dense model on Modal that generated the synthetic training corpus. It never ships in the app and is not counted below. - **Runtime:** the student runs **in-Space on ZeroGPU** (free) — base Qwen3-4B + the published LoRA, merged at the first GPU call, generating via transformers. **Modal** is used only for the **offline** pipeline (corpus generation, fine-tune, LoRA merge), which is pay-per-use; keeping a GPU warm for serving was not cost-viable. - **Quality:** an Opus-4.8 content audit found the first model fabricated sources and made numeric errors; the corpus was regenerated with a numeric-auditor pass and the model retrained (**v3**) — fabricated sources and severe math errors went to **0/6** on a number-heavy re-audit. Figures are illustrative; the UI says "verify before class". ### Parameter budget (hackathon: ≤32B total) | Model | Where | Params | |---|---|---| | Qwen3-4B-Instruct-2507 + LoRA | in the app (ZeroGPU) | **4.0B** | | **Total loaded by the Space** | | **4.0B** | The teacher model used to *create the training data* runs offline on Modal and is **not loaded by the app**, so it does not count toward the budget. **4.0B ≤ 4B** keeps the **Tiny Titan** badge. ## Pipeline (reproducible) 1. **Seeds** — structural skeletons from open Brazilian cases (ENAP Casoteca via OAI-PMH) + hand-authored ones. Real case *text* is training input only, never exposed or published. 2. **Synthetic generation** — the teacher writes ~610 original case+note pairs from the seeds (99.7% schema-valid) on Modal. 3. **Fine-tune** — LoRA on the ≤4B student (Modal, H100). The student learns to map a short request → the full JSON contract. 4. **Eval** — short-prompt schema validity: **base 0% → fine-tuned 100%** (20 held-out seeds), 100% with sourced data. The output contract lives in [`data/schema.py`](data/schema.py) and is enforced both when generating the corpus and when validating in the app. ## Awards in play **Backyard AI** (a real instructor authoring cases he'll teach) · **Well-Tuned** (published fine-tune) · **Tiny Titan** (≤4B) · **Best Agent** (multi-stage teacher→audit→student pipeline) · **Modal** (corpus generation, fine-tune and LoRA merge run on Modal credits) · **Off-Grid** (app serves its own weights in-Space on ZeroGPU — no third-party model API). ## Run locally ```bash pip install -r requirements.txt python app.py # http://localhost:7860 # No GPU? The UI runs on a real sample case from the corpus: CASE_FORGE_DEMO=1 python app.py ``` Config (env): `CASE_FORGE_ADAPTER` (HF repo id of the LoRA), `CASE_FORGE_BASE` (base model), `CASE_FORGE_DEMO=1` (force the sample), `CASE_FORGE_MAX_TOKENS`.