Spaces:
Sleeping
Sleeping
File size: 1,649 Bytes
59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | # Research
Experimental code for **fine-tuning** and **agentic benchmarks**. Nothing here is wired into the Gradio Lesson Agent by default β use it to train models and score checkpoints against public benchmarks.
| Path | Purpose |
| ---- | ------- |
| [`finetune.py`](finetune.py) | LoRA / QLoRA / full fine-tune on chat or instruction data |
| [`evals/`](evals/) | SLM agentic benchmark suite β BFCL, Ο-bench, GAIA, SWE-bench (uv package `slm-evals`) |
| [`data/`](data/) | Shared JSONL datasets for finetune and evals |
## Quick links
- **[USAGE.md](USAGE.md)** β install groups, commands, and typical workflows
- **[docs/overview.md](docs/overview.md)** β how the pieces fit together
- **[evals/USAGE.md](evals/USAGE.md)** β benchmark CLI, configs, and results
- **[evals/docs/benchmarks.md](evals/docs/benchmarks.md)** β what each benchmark measures
## Install (from repo root)
```bash
# All research tooling
uv sync --group finetune --group evals --group lm-eval
```
Individual groups:
| Group | Command | Enables |
| ----- | ------- | ------- |
| `finetune` | `uv sync --group finetune` | `research/finetune.py` (LoRA, QLoRA, merge) |
| `evals` | `uv sync --group evals` | `research/evals/` package (`slm-benchmark`) |
| `lm-eval` | `uv sync --group lm-eval` | `slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, β¦) |
## Typical workflow
```text
research/data/education-lesson-chat.jsonl
β
βΌ
research/finetune.py βββΊ models/finetuned/<preset>-lora/
β
ββββΊ research/evals/ (BFCL, Ο-bench, GAIA, SWE-bench, lm-eval)
```
See [USAGE.md](USAGE.md) for copy-paste commands.
|