Spaces:
Running on Zero
Running on Zero
| title: Case Forge | |
| emoji: 📓 | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.17.3 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: Harvard-style teaching cases + notes, forged from a topic. | |
| tags: | |
| - backyard-ai | |
| - tiny-titan | |
| - well-tuned | |
| - off-grid | |
| - best-agent | |
| # 📓 Case Forge — teaching cases on demand | |
| > Built for the **Build Small Hackathon** · **Backyard AI** track · for me — an MBA instructor — and fellow professors who **author** their own teaching cases. | |
| It's an **authoring tool**: generate a draft, then edit the Markdown live, regenerate, | |
| and export. The real-user test is an instructor forging a case they'll actually teach. | |
| **Demo video:** _‹add link›_ · **Social post:** _‹add link›_ · **Model:** _‹add HF model link›_ | |
| Case Forge turns a one-line request — **domain + topic + level + language** — into a | |
| complete **Harvard-style teaching case** *and* its **teaching note**, structured and | |
| ready to take into a classroom. The case opens on a protagonist facing a real | |
| dilemma, lays out sourced data and the paths to weigh, and **stops at the decision | |
| point** (the outcome lives in the note's epilogue, where the instructor wants it). | |
| The hero isn't generic prose — it's **structure that holds**: the dilemma never | |
| leaks its answer, learning objectives stay ≤4 and measurable, and discussion | |
| questions line up with the objectives. The app shows these as live quality badges. | |
| ## How it works | |
| A short request expands into the full output contract because **the format is in the | |
| weights, not the prompt**. We fine-tuned a small student model on a synthetic corpus | |
| of cases generated by a large teacher — so the ≤4B model reliably emits the whole | |
| schema from a minimal ask. | |
| - **Student (in the app):** [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) + a published **Case Forge LoRA** adapter. | |
| - **Teacher (offline only):** a large dense model on Modal that generated the | |
| synthetic training corpus. It never ships in the app and is not counted below. | |
| - **Runtime:** the student runs **in-Space on ZeroGPU** (free) — base Qwen3-4B + the | |
| published LoRA, merged at the first GPU call, generating via transformers. **Modal** | |
| is used only for the **offline** pipeline (corpus generation, fine-tune, LoRA merge), | |
| which is pay-per-use; keeping a GPU warm for serving was not cost-viable. | |
| - **Quality:** an Opus-4.8 content audit found the first model fabricated sources and | |
| made numeric errors; the corpus was regenerated with a numeric-auditor pass and the | |
| model retrained (**v3**) — fabricated sources and severe math errors went to **0/6** | |
| on a number-heavy re-audit. Figures are illustrative; the UI says "verify before class". | |
| ### Parameter budget (hackathon: ≤32B total) | |
| | Model | Where | Params | | |
| |---|---|---| | |
| | Qwen3-4B-Instruct-2507 + LoRA | in the app (ZeroGPU) | **4.0B** | | |
| | **Total loaded by the Space** | | **4.0B** | | |
| The teacher model used to *create the training data* runs offline on Modal and is | |
| **not loaded by the app**, so it does not count toward the budget. **4.0B ≤ 4B** | |
| keeps the **Tiny Titan** badge. | |
| ## Pipeline (reproducible) | |
| 1. **Seeds** — structural skeletons from open Brazilian cases (ENAP Casoteca via | |
| OAI-PMH) + hand-authored ones. Real case *text* is training input only, never | |
| exposed or published. | |
| 2. **Synthetic generation** — the teacher writes ~610 original case+note pairs from | |
| the seeds (99.7% schema-valid) on Modal. | |
| 3. **Fine-tune** — LoRA on the ≤4B student (Modal, H100). The student learns to map | |
| a short request → the full JSON contract. | |
| 4. **Eval** — short-prompt schema validity: **base 0% → fine-tuned 100%** (20 held-out | |
| seeds), 100% with sourced data. | |
| The output contract lives in [`data/schema.py`](data/schema.py) and is enforced both | |
| when generating the corpus and when validating in the app. | |
| ## Awards in play | |
| **Backyard AI** (a real instructor authoring cases he'll teach) · **Well-Tuned** | |
| (published fine-tune) · **Tiny Titan** (≤4B) · **Best Agent** (multi-stage | |
| teacher→audit→student pipeline) · **Modal** (corpus generation, fine-tune and LoRA | |
| merge run on Modal credits) · **Off-Grid** (app serves its own weights in-Space on | |
| ZeroGPU — no third-party model API). | |
| ## Run locally | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py # http://localhost:7860 | |
| # No GPU? The UI runs on a real sample case from the corpus: | |
| CASE_FORGE_DEMO=1 python app.py | |
| ``` | |
| Config (env): `CASE_FORGE_ADAPTER` (HF repo id of the LoRA), `CASE_FORGE_BASE` | |
| (base model), `CASE_FORGE_DEMO=1` (force the sample), `CASE_FORGE_MAX_TOKENS`. | |