Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
Research overview
How research/ relates to the main hackathon repo and what each component does.
Position in the repo
small-model-hackathon/
βββ apps/gradio-space/ β shipped Lesson Agent UI
βββ libs/agent/ β skill loop, tools, traces
βββ libs/inference/ β transformers + llama.cpp backends
βββ models.yaml β model presets (shared with finetune)
βββ research/ β experiments (this tree)
βββ finetune.py
βββ data/
βββ evals/ β uv workspace package
Research code is a uv workspace sibling of apps/* and libs/*. Root pyproject.toml declares optional dependency groups (finetune, evals, lm-eval) so the Docker Space image does not need to install torch-heavy extras unless you opt in locally.
Two tracks
Fine-tuning
research/finetune.py adapts a small HF causal LM on instruction or chat data. It reuses root models.yaml presets and the shared inference config loader, so the same minicpm5-1b preset used in the Gradio app can be fine-tuned without duplicating model metadata.
Outputs land in models/finetuned/ β you can register a new preset in models.yaml pointing at merged weights for the Well-Tuned hackathon badge.
Agentic and academic evals
research/evals/ (slm-evals package) scores whole models on:
- Agentic benchmarks β BFCL, Ο-bench, GAIA, SWE-bench (
slm-benchmark) - Academic benchmarks β GSM8K, ARC, HellaSwag, etc. via lm-evaluation-harness (
slm-lm-eval)
Data flow
flowchart LR
subgraph data [research/data]
lesson[education-lesson-chat.jsonl]
qa[benchmark-qa.jsonl]
kb[benchmark-kb.jsonl]
end
subgraph train [finetune.py]
ckpt[models/finetuned/]
end
subgraph evals [slm-evals]
bfcl[BFCL]
tau[tau-bench]
gaia[GAIA]
swe[SWE-bench]
lmeval[lm-eval tasks]
end
lesson --> train
train --> ckpt
ckpt --> evals
When to use which tool
| Goal | Tool |
|---|---|
| Improve lesson slide quality on your data | finetune.py + optional eval before/after |
| Compare base vs LoRA on public agent tasks | slm-benchmark |
| Compare base vs LoRA on academic tasks | slm-lm-eval |
| Ship in Gradio Space | apps/gradio-space only β wire new weights via models.yaml |
Workspace package
research/evals is listed in root [tool.uv.workspace] members as import name slm_evals, CLI slm-benchmark and slm-lm-eval.
Run with uv run --package slm-evals ... from the repo root so uv resolves workspace paths and shared lockfile versions.