Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

lesson-agent-dev / research /docs /overview.md

MSG

Feat/last sprint (#12)

871f869 13 days ago

preview code

Raw

History Blame Contribute Delete

2.66 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Research overview

How research/ relates to the main hackathon repo and what each component does.

Position in the repo

small-model-hackathon/
├── apps/gradio-space/     ← shipped Lesson Agent UI
├── libs/agent/            ← skill loop, tools, traces
├── libs/inference/        ← transformers + llama.cpp backends
├── models.yaml            ← model presets (shared with finetune)
└── research/              ← experiments (this tree)
    ├── finetune.py
    ├── data/
    └── evals/             ← uv workspace package

Research code is a uv workspace sibling of apps/* and libs/*. Root pyproject.toml declares optional dependency groups (finetune, evals, lm-eval) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally.

Two tracks

Fine-tuning

research/finetune.py adapts a small HF causal LM on instruction or chat data. It reuses root models.yaml presets and the shared inference config loader, so the same minicpm5-1b preset used in the Gradio app can be fine-tuned without duplicating model metadata.

Outputs land in models/finetuned/ — you can register a new preset in models.yaml pointing at merged weights for the Well-Tuned hackathon badge.

Agentic and academic evals

research/evals/ (slm-evals package) scores whole models on:

Agentic benchmarks — BFCL, τ-bench, GAIA, SWE-bench (slm-benchmark)
Academic benchmarks — GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (slm-lm-eval)

Data flow

flowchart LR
  subgraph data [research/data]
    lesson[education-lesson-chat.jsonl]
    qa[benchmark-qa.jsonl]
    kb[benchmark-kb.jsonl]
  end

  subgraph train [finetune.py]
    ckpt[models/finetuned/]
  end

  subgraph evals [slm-evals]
    bfcl[BFCL]
    tau[tau-bench]
    gaia[GAIA]
    swe[SWE-bench]
    lmeval[lm-eval tasks]
  end

  lesson --> train
  train --> ckpt
  ckpt --> evals

When to use which tool

Goal	Tool
Improve lesson slide quality on your data	`finetune.py` + optional eval before/after
Compare base vs LoRA on public agent tasks	`slm-benchmark`
Compare base vs LoRA on academic tasks	`slm-lm-eval`
Ship in Gradio Space	`apps/gradio-space` only — wire new weights via `models.yaml`

Workspace package

research/evals is listed in root [tool.uv.workspace] members as import name slm_evals, CLI slm-benchmark and slm-lm-eval.

Run with uv run --package slm-evals ... from the repo root so uv resolves workspace paths and shared lockfile versions.