Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
Research
Experimental code for fine-tuning and agentic benchmarks. Nothing here is wired into the Gradio Lesson Agent by default β use it to train models and score checkpoints against public benchmarks.
| Path | Purpose |
|---|---|
finetune.py |
LoRA / QLoRA / full fine-tune on chat or instruction data |
evals/ |
SLM agentic benchmark suite β BFCL, Ο-bench, GAIA, SWE-bench (uv package slm-evals) |
data/ |
Shared JSONL datasets for finetune and evals |
Quick links
- USAGE.md β install groups, commands, and typical workflows
- docs/overview.md β how the pieces fit together
- evals/USAGE.md β benchmark CLI, configs, and results
- evals/docs/benchmarks.md β what each benchmark measures
Install (from repo root)
# All research tooling
uv sync --group finetune --group evals --group lm-eval
Individual groups:
| Group | Command | Enables |
|---|---|---|
finetune |
uv sync --group finetune |
research/finetune.py (LoRA, QLoRA, merge) |
evals |
uv sync --group evals |
research/evals/ package (slm-benchmark) |
lm-eval |
uv sync --group lm-eval |
slm-lm-eval CLI (GSM8K, ARC, HellaSwag, β¦) |
Typical workflow
research/data/education-lesson-chat.jsonl
β
βΌ
research/finetune.py βββΊ models/finetuned/<preset>-lora/
β
ββββΊ research/evals/ (BFCL, Ο-bench, GAIA, SWE-bench, lm-eval)
See USAGE.md for copy-paste commands.