Spaces:
Sleeping
Sleeping
| # Research overview | |
| How `research/` relates to the main hackathon repo and what each component does. | |
| ## Position in the repo | |
| ```text | |
| small-model-hackathon/ | |
| βββ apps/gradio-space/ β shipped Lesson Agent UI | |
| βββ libs/agent/ β skill loop, tools, traces | |
| βββ libs/inference/ β transformers + llama.cpp backends | |
| βββ models.yaml β model presets (shared with finetune) | |
| βββ research/ β experiments (this tree) | |
| βββ finetune.py | |
| βββ data/ | |
| βββ evals/ β uv workspace package | |
| ``` | |
| Research code is a **uv workspace sibling** of `apps/*` and `libs/*`. Root `pyproject.toml` declares optional dependency groups (`finetune`, `evals`, `lm-eval`) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally. | |
| ## Two tracks | |
| ### Fine-tuning | |
| `research/finetune.py` adapts a small HF causal LM on instruction or chat data. It reuses root `models.yaml` presets and the shared inference config loader, so the same `minicpm5-1b` preset used in the Gradio app can be fine-tuned without duplicating model metadata. | |
| Outputs land in `models/finetuned/` β you can register a new preset in `models.yaml` pointing at merged weights for the **Well-Tuned** hackathon badge. | |
| ### Agentic and academic evals | |
| `research/evals/` (`slm-evals` package) scores **whole models** on: | |
| - **Agentic benchmarks** β BFCL, Ο-bench, GAIA, SWE-bench (`slm-benchmark`) | |
| - **Academic benchmarks** β GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (`slm-lm-eval`) | |
| ## Data flow | |
| ```mermaid | |
| flowchart LR | |
| subgraph data [research/data] | |
| lesson[education-lesson-chat.jsonl] | |
| qa[benchmark-qa.jsonl] | |
| kb[benchmark-kb.jsonl] | |
| end | |
| subgraph train [finetune.py] | |
| ckpt[models/finetuned/] | |
| end | |
| subgraph evals [slm-evals] | |
| bfcl[BFCL] | |
| tau[tau-bench] | |
| gaia[GAIA] | |
| swe[SWE-bench] | |
| lmeval[lm-eval tasks] | |
| end | |
| lesson --> train | |
| train --> ckpt | |
| ckpt --> evals | |
| ``` | |
| ## When to use which tool | |
| | Goal | Tool | | |
| | ---- | ---- | | |
| | Improve lesson slide quality on your data | `finetune.py` + optional eval before/after | | |
| | Compare base vs LoRA on public agent tasks | `slm-benchmark` | | |
| | Compare base vs LoRA on academic tasks | `slm-lm-eval` | | |
| | Ship in Gradio Space | `apps/gradio-space` only β wire new weights via `models.yaml` | | |
| ## Workspace package | |
| `research/evals` is listed in root `[tool.uv.workspace] members` as import name `slm_evals`, CLI `slm-benchmark` and `slm-lm-eval`. | |
| Run with `uv run --package slm-evals ...` from the repo root so uv resolves workspace paths and shared lockfile versions. | |