Spaces:
Sleeping
Sleeping
File size: 2,661 Bytes
59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 59e2c8a 871f869 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | # Research overview
How `research/` relates to the main hackathon repo and what each component does.
## Position in the repo
```text
small-model-hackathon/
βββ apps/gradio-space/ β shipped Lesson Agent UI
βββ libs/agent/ β skill loop, tools, traces
βββ libs/inference/ β transformers + llama.cpp backends
βββ models.yaml β model presets (shared with finetune)
βββ research/ β experiments (this tree)
βββ finetune.py
βββ data/
βββ evals/ β uv workspace package
```
Research code is a **uv workspace sibling** of `apps/*` and `libs/*`. Root `pyproject.toml` declares optional dependency groups (`finetune`, `evals`, `lm-eval`) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally.
## Two tracks
### Fine-tuning
`research/finetune.py` adapts a small HF causal LM on instruction or chat data. It reuses root `models.yaml` presets and the shared inference config loader, so the same `minicpm5-1b` preset used in the Gradio app can be fine-tuned without duplicating model metadata.
Outputs land in `models/finetuned/` β you can register a new preset in `models.yaml` pointing at merged weights for the **Well-Tuned** hackathon badge.
### Agentic and academic evals
`research/evals/` (`slm-evals` package) scores **whole models** on:
- **Agentic benchmarks** β BFCL, Ο-bench, GAIA, SWE-bench (`slm-benchmark`)
- **Academic benchmarks** β GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (`slm-lm-eval`)
## Data flow
```mermaid
flowchart LR
subgraph data [research/data]
lesson[education-lesson-chat.jsonl]
qa[benchmark-qa.jsonl]
kb[benchmark-kb.jsonl]
end
subgraph train [finetune.py]
ckpt[models/finetuned/]
end
subgraph evals [slm-evals]
bfcl[BFCL]
tau[tau-bench]
gaia[GAIA]
swe[SWE-bench]
lmeval[lm-eval tasks]
end
lesson --> train
train --> ckpt
ckpt --> evals
```
## When to use which tool
| Goal | Tool |
| ---- | ---- |
| Improve lesson slide quality on your data | `finetune.py` + optional eval before/after |
| Compare base vs LoRA on public agent tasks | `slm-benchmark` |
| Compare base vs LoRA on academic tasks | `slm-lm-eval` |
| Ship in Gradio Space | `apps/gradio-space` only β wire new weights via `models.yaml` |
## Workspace package
`research/evals` is listed in root `[tool.uv.workspace] members` as import name `slm_evals`, CLI `slm-benchmark` and `slm-lm-eval`.
Run with `uv run --package slm-evals ...` from the repo root so uv resolves workspace paths and shared lockfile versions.
|