File size: 1,649 Bytes
59e2c8a
 
871f869
59e2c8a
 
 
 
 
871f869
59e2c8a
 
 
 
 
 
 
 
 
 
 
871f869
 
59e2c8a
 
 
 
 
 
 
 
871f869
59e2c8a
 
 
 
 
 
 
 
 
871f869
59e2c8a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Research

Experimental code for **fine-tuning** and **agentic benchmarks**. Nothing here is wired into the Gradio Lesson Agent by default β€” use it to train models and score checkpoints against public benchmarks.

| Path | Purpose |
| ---- | ------- |
| [`finetune.py`](finetune.py) | LoRA / QLoRA / full fine-tune on chat or instruction data |
| [`evals/`](evals/) | SLM agentic benchmark suite β€” BFCL, Ο„-bench, GAIA, SWE-bench (uv package `slm-evals`) |
| [`data/`](data/) | Shared JSONL datasets for finetune and evals |

## Quick links

- **[USAGE.md](USAGE.md)** β€” install groups, commands, and typical workflows
- **[docs/overview.md](docs/overview.md)** β€” how the pieces fit together
- **[evals/USAGE.md](evals/USAGE.md)** β€” benchmark CLI, configs, and results
- **[evals/docs/benchmarks.md](evals/docs/benchmarks.md)** β€” what each benchmark measures

## Install (from repo root)

```bash
# All research tooling
uv sync --group finetune --group evals --group lm-eval
```

Individual groups:

| Group | Command | Enables |
| ----- | ------- | ------- |
| `finetune` | `uv sync --group finetune` | `research/finetune.py` (LoRA, QLoRA, merge) |
| `evals` | `uv sync --group evals` | `research/evals/` package (`slm-benchmark`) |
| `lm-eval` | `uv sync --group lm-eval` | `slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, …) |

## Typical workflow

```text
research/data/education-lesson-chat.jsonl
        β”‚
        β–Ό
  research/finetune.py  ──►  models/finetuned/<preset>-lora/
        β”‚
        └──► research/evals/  (BFCL, Ο„-bench, GAIA, SWE-bench, lm-eval)
```

See [USAGE.md](USAGE.md) for copy-paste commands.