Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

lesson-agent-dev / research /README.md

MSG

Feat/last sprint (#12)

871f869 15 days ago

preview code

Raw

History Blame Contribute Delete

1.65 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Research

Experimental code for fine-tuning and agentic benchmarks. Nothing here is wired into the Gradio Lesson Agent by default — use it to train models and score checkpoints against public benchmarks.

Path	Purpose
`finetune.py`	LoRA / QLoRA / full fine-tune on chat or instruction data
`evals/`	SLM agentic benchmark suite — BFCL, τ-bench, GAIA, SWE-bench (uv package `slm-evals`)
`data/`	Shared JSONL datasets for finetune and evals

Quick links

USAGE.md — install groups, commands, and typical workflows
docs/overview.md — how the pieces fit together
evals/USAGE.md — benchmark CLI, configs, and results
evals/docs/benchmarks.md — what each benchmark measures

Install (from repo root)

# All research tooling
uv sync --group finetune --group evals --group lm-eval

Individual groups:

Group	Command	Enables
`finetune`	`uv sync --group finetune`	`research/finetune.py` (LoRA, QLoRA, merge)
`evals`	`uv sync --group evals`	`research/evals/` package (`slm-benchmark`)
`lm-eval`	`uv sync --group lm-eval`	`slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, …)

Typical workflow

research/data/education-lesson-chat.jsonl
        │
        ▼
  research/finetune.py  ──►  models/finetuned/<preset>-lora/
        │
        └──► research/evals/  (BFCL, τ-bench, GAIA, SWE-bench, lm-eval)

See USAGE.md for copy-paste commands.