Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

lesson-agent-dev / research /README.md

MSG

Feat/last sprint (#12)

871f869 16 days ago

preview code

Raw

History Blame Contribute Delete

1.65 kB

	# Research

	Experimental code for fine-tuning and agentic benchmarks. Nothing here is wired into the Gradio Lesson Agent by default — use it to train models and score checkpoints against public benchmarks.

	\| Path \| Purpose \|
	\| ---- \| ------- \|
	\| [`finetune.py`](finetune.py) \| LoRA / QLoRA / full fine-tune on chat or instruction data \|
	\| [`evals/`](evals/) \| SLM agentic benchmark suite — BFCL, τ-bench, GAIA, SWE-bench (uv package `slm-evals`) \|
	\| [`data/`](data/) \| Shared JSONL datasets for finetune and evals \|

	## Quick links

	- [USAGE.md](USAGE.md) — install groups, commands, and typical workflows
	- [docs/overview.md](docs/overview.md) — how the pieces fit together
	- [evals/USAGE.md](evals/USAGE.md) — benchmark CLI, configs, and results
	- [evals/docs/benchmarks.md](evals/docs/benchmarks.md) — what each benchmark measures

	## Install (from repo root)

	```bash
	# All research tooling
	uv sync --group finetune --group evals --group lm-eval
	```

	Individual groups:

	\| Group \| Command \| Enables \|
	\| ----- \| ------- \| ------- \|
	\| `finetune` \| `uv sync --group finetune` \| `research/finetune.py` (LoRA, QLoRA, merge) \|
	\| `evals` \| `uv sync --group evals` \| `research/evals/` package (`slm-benchmark`) \|
	\| `lm-eval` \| `uv sync --group lm-eval` \| `slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, …) \|

	## Typical workflow

	```text
	research/data/education-lesson-chat.jsonl
	│
	▼
	research/finetune.py ──► models/finetuned/<preset>-lora/
	│
	└──► research/evals/ (BFCL, τ-bench, GAIA, SWE-bench, lm-eval)
	```

	See [USAGE.md](USAGE.md) for copy-paste commands.