case-forge / README.md
nextmarte's picture
App back to ZeroGPU (merged in-Space); Modal offline-only
59a4de2 verified
|
Raw
History Blame Contribute Delete
4.69 kB
---
title: Case Forge
emoji: 📓
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.17.3
app_file: app.py
pinned: true
license: apache-2.0
short_description: Harvard-style teaching cases + notes, forged from a topic.
tags:
- backyard-ai
- tiny-titan
- well-tuned
- off-grid
- best-agent
---
# 📓 Case Forge — teaching cases on demand
> Built for the **Build Small Hackathon** · **Backyard AI** track · for me — an MBA instructor — and fellow professors who **author** their own teaching cases.
It's an **authoring tool**: generate a draft, then edit the Markdown live, regenerate,
and export. The real-user test is an instructor forging a case they'll actually teach.
**Demo video:** _‹add link›_ · **Social post:** _‹add link›_ · **Model:** _‹add HF model link›_
Case Forge turns a one-line request — **domain + topic + level + language** — into a
complete **Harvard-style teaching case** *and* its **teaching note**, structured and
ready to take into a classroom. The case opens on a protagonist facing a real
dilemma, lays out sourced data and the paths to weigh, and **stops at the decision
point** (the outcome lives in the note's epilogue, where the instructor wants it).
The hero isn't generic prose — it's **structure that holds**: the dilemma never
leaks its answer, learning objectives stay ≤4 and measurable, and discussion
questions line up with the objectives. The app shows these as live quality badges.
## How it works
A short request expands into the full output contract because **the format is in the
weights, not the prompt**. We fine-tuned a small student model on a synthetic corpus
of cases generated by a large teacher — so the ≤4B model reliably emits the whole
schema from a minimal ask.
- **Student (in the app):** [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) + a published **Case Forge LoRA** adapter.
- **Teacher (offline only):** a large dense model on Modal that generated the
synthetic training corpus. It never ships in the app and is not counted below.
- **Runtime:** the student runs **in-Space on ZeroGPU** (free) — base Qwen3-4B + the
published LoRA, merged at the first GPU call, generating via transformers. **Modal**
is used only for the **offline** pipeline (corpus generation, fine-tune, LoRA merge),
which is pay-per-use; keeping a GPU warm for serving was not cost-viable.
- **Quality:** an Opus-4.8 content audit found the first model fabricated sources and
made numeric errors; the corpus was regenerated with a numeric-auditor pass and the
model retrained (**v3**) — fabricated sources and severe math errors went to **0/6**
on a number-heavy re-audit. Figures are illustrative; the UI says "verify before class".
### Parameter budget (hackathon: ≤32B total)
| Model | Where | Params |
|---|---|---|
| Qwen3-4B-Instruct-2507 + LoRA | in the app (ZeroGPU) | **4.0B** |
| **Total loaded by the Space** | | **4.0B** |
The teacher model used to *create the training data* runs offline on Modal and is
**not loaded by the app**, so it does not count toward the budget. **4.0B ≤ 4B**
keeps the **Tiny Titan** badge.
## Pipeline (reproducible)
1. **Seeds** — structural skeletons from open Brazilian cases (ENAP Casoteca via
OAI-PMH) + hand-authored ones. Real case *text* is training input only, never
exposed or published.
2. **Synthetic generation** — the teacher writes ~610 original case+note pairs from
the seeds (99.7% schema-valid) on Modal.
3. **Fine-tune** — LoRA on the ≤4B student (Modal, H100). The student learns to map
a short request → the full JSON contract.
4. **Eval** — short-prompt schema validity: **base 0% → fine-tuned 100%** (20 held-out
seeds), 100% with sourced data.
The output contract lives in [`data/schema.py`](data/schema.py) and is enforced both
when generating the corpus and when validating in the app.
## Awards in play
**Backyard AI** (a real instructor authoring cases he'll teach) · **Well-Tuned**
(published fine-tune) · **Tiny Titan** (≤4B) · **Best Agent** (multi-stage
teacher→audit→student pipeline) · **Modal** (corpus generation, fine-tune and LoRA
merge run on Modal credits) · **Off-Grid** (app serves its own weights in-Space on
ZeroGPU — no third-party model API).
## Run locally
```bash
pip install -r requirements.txt
python app.py # http://localhost:7860
# No GPU? The UI runs on a real sample case from the corpus:
CASE_FORGE_DEMO=1 python app.py
```
Config (env): `CASE_FORGE_ADAPTER` (HF repo id of the LoRA), `CASE_FORGE_BASE`
(base model), `CASE_FORGE_DEMO=1` (force the sample), `CASE_FORGE_MAX_TOKENS`.