lesson-agent-dev / research /docs /overview.md
MSG
Feat/last sprint (#12)
871f869
|
Raw
History Blame Contribute Delete
2.66 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Research overview

How research/ relates to the main hackathon repo and what each component does.

Position in the repo

small-model-hackathon/
β”œβ”€β”€ apps/gradio-space/     ← shipped Lesson Agent UI
β”œβ”€β”€ libs/agent/            ← skill loop, tools, traces
β”œβ”€β”€ libs/inference/        ← transformers + llama.cpp backends
β”œβ”€β”€ models.yaml            ← model presets (shared with finetune)
└── research/              ← experiments (this tree)
    β”œβ”€β”€ finetune.py
    β”œβ”€β”€ data/
    └── evals/             ← uv workspace package

Research code is a uv workspace sibling of apps/* and libs/*. Root pyproject.toml declares optional dependency groups (finetune, evals, lm-eval) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally.

Two tracks

Fine-tuning

research/finetune.py adapts a small HF causal LM on instruction or chat data. It reuses root models.yaml presets and the shared inference config loader, so the same minicpm5-1b preset used in the Gradio app can be fine-tuned without duplicating model metadata.

Outputs land in models/finetuned/ β€” you can register a new preset in models.yaml pointing at merged weights for the Well-Tuned hackathon badge.

Agentic and academic evals

research/evals/ (slm-evals package) scores whole models on:

  • Agentic benchmarks β€” BFCL, Ο„-bench, GAIA, SWE-bench (slm-benchmark)
  • Academic benchmarks β€” GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (slm-lm-eval)

Data flow

flowchart LR
  subgraph data [research/data]
    lesson[education-lesson-chat.jsonl]
    qa[benchmark-qa.jsonl]
    kb[benchmark-kb.jsonl]
  end

  subgraph train [finetune.py]
    ckpt[models/finetuned/]
  end

  subgraph evals [slm-evals]
    bfcl[BFCL]
    tau[tau-bench]
    gaia[GAIA]
    swe[SWE-bench]
    lmeval[lm-eval tasks]
  end

  lesson --> train
  train --> ckpt
  ckpt --> evals

When to use which tool

Goal Tool
Improve lesson slide quality on your data finetune.py + optional eval before/after
Compare base vs LoRA on public agent tasks slm-benchmark
Compare base vs LoRA on academic tasks slm-lm-eval
Ship in Gradio Space apps/gradio-space only β€” wire new weights via models.yaml

Workspace package

research/evals is listed in root [tool.uv.workspace] members as import name slm_evals, CLI slm-benchmark and slm-lm-eval.

Run with uv run --package slm-evals ... from the repo root so uv resolves workspace paths and shared lockfile versions.