Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

lesson-agent-dev / research /docs /overview.md

MSG

Feat/last sprint (#12)

871f869 13 days ago

preview code

Raw

History Blame Contribute Delete

2.66 kB

	# Research overview

	How `research/` relates to the main hackathon repo and what each component does.

	## Position in the repo

	```text
	small-model-hackathon/
	├── apps/gradio-space/ ← shipped Lesson Agent UI
	├── libs/agent/ ← skill loop, tools, traces
	├── libs/inference/ ← transformers + llama.cpp backends
	├── models.yaml ← model presets (shared with finetune)
	└── research/ ← experiments (this tree)
	├── finetune.py
	├── data/
	└── evals/ ← uv workspace package
	```

	Research code is a uv workspace sibling of `apps/` and `libs/`. Root `pyproject.toml` declares optional dependency groups (`finetune`, `evals`, `lm-eval`) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally.

	## Two tracks

	### Fine-tuning

	`research/finetune.py` adapts a small HF causal LM on instruction or chat data. It reuses root `models.yaml` presets and the shared inference config loader, so the same `minicpm5-1b` preset used in the Gradio app can be fine-tuned without duplicating model metadata.

	Outputs land in `models/finetuned/` — you can register a new preset in `models.yaml` pointing at merged weights for the Well-Tuned hackathon badge.

	### Agentic and academic evals

	`research/evals/` (`slm-evals` package) scores whole models on:

	- Agentic benchmarks — BFCL, τ-bench, GAIA, SWE-bench (`slm-benchmark`)
	- Academic benchmarks — GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (`slm-lm-eval`)

	## Data flow

	```mermaid
	flowchart LR
	subgraph data [research/data]
	lesson[education-lesson-chat.jsonl]
	qa[benchmark-qa.jsonl]
	kb[benchmark-kb.jsonl]
	end

	subgraph train [finetune.py]
	ckpt[models/finetuned/]
	end

	subgraph evals [slm-evals]
	bfcl[BFCL]
	tau[tau-bench]
	gaia[GAIA]
	swe[SWE-bench]
	lmeval[lm-eval tasks]
	end

	lesson --> train
	train --> ckpt
	ckpt --> evals
	```

	## When to use which tool

	\| Goal \| Tool \|
	\| ---- \| ---- \|
	\| Improve lesson slide quality on your data \| `finetune.py` + optional eval before/after \|
	\| Compare base vs LoRA on public agent tasks \| `slm-benchmark` \|
	\| Compare base vs LoRA on academic tasks \| `slm-lm-eval` \|
	\| Ship in Gradio Space \| `apps/gradio-space` only — wire new weights via `models.yaml` \|

	## Workspace package

	`research/evals` is listed in root `[tool.uv.workspace] members` as import name `slm_evals`, CLI `slm-benchmark` and `slm-lm-eval`.

	Run with `uv run --package slm-evals ...` from the repo root so uv resolves workspace paths and shared lockfile versions.