Spaces:
Running on Zero
title: Linalg-Zero
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: 3.12.12
app_file: linalg_zero/demo/app.py
pinned: false
Linalg-Zero
Overview
Table of Contents
This repository offers tools for generating a linear algebra problem dataset and training an open-source base model (i.e. Qwen2.5-3B), aiming to explore planning and tool use using SFT and RL, distinct from Deepseek-R1's primary emphasis on reasoning.
The project is simple by design and mostly consists of:
linalg_zero/: contains the scripts to train models as well as generate synthetic data:generate.py: generates the linear algebra dataset and splits.distillation.py: runs the distillation pipeline to create multi-turn tool-use data.sft_train.py: performs a simple SFT of a model on a dataset.grpo_train.py: trains a model with GRPO on a given dataset.
Makefile: contains easy-to-run commands for the dataset and training workflows using previous scripts.
Main Phases
We use the DeepSeek-R1 tech report as a loose guide, but the project phases are:
- Step 1: generate a linear algebra dataset with controlled difficulty and tool-call metadata.
- Step 2: distill multi-turn tool-use data from a teacher model.
- Step 3: SFT the base model on the dataset to teach the tool-calling format.
- Step 4: GRPO fine-tune on the tool-use tasks, using a curriculum.
Installation
We use uv as the dependency management tool.
First, to install uv, follow the UV Installation Guide.
To run the experiments install the dependencies using:
- For generation/distillation:
make install-data-gen - For SFT:
make install-sft - For RL:
make install-grpo
Next, log into your Hugging Face and Weights and Biases accounts as follows:
huggingface-cli login
wandb login
Quickstart
After installing dependencies above, run the commands below. For modifications, see the config files.
# Phase 1: Generate dataset
uv run python linalg_zero/generate.py --dataset_name atomwalk12/linalgzero --push_dataset
# Phase 2: Distillation (setup once)
cp linalg_zero/config/distillation/env.example.sh env.sh
# Edit env.sh to set HF_TOKEN and ARGILLA_API_KEY.
source env.sh
# Terminal A
uv run python linalg_zero/distillation/launch_server.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml
# Terminal B (new terminal; source env.sh again)
source env.sh
uv run python linalg_zero/distillation.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml
# Phase 3: SFT
uv run python linalg_zero/sft_train.py --config linalg_zero/config/sft/qwen2.5-3B/lora.yaml
# Phase 4: GRPO
uv run python linalg_zero/grpo_train.py --config-name runpod.yaml
Training requires the dataset to follow the strict OpenAI tool-calling format (see this link). We provide scripts to prepare and validate the data accordingly:
linalg_zero/sft/scripts/prepare_dataset.py: prepares the SFT dataset.grpo/scripts/prepare_dataset.py: prepares and validates the GRPO dataset.
Results
We provide a recipe to encourage planning and tool-use capabilities in the Qwen2.5-3B model, starting from a pre-trained (not instruction-tuned) base model.
This yields models like Linalg-Zero-SFT and Linalg-Zero-GRPO, with the following downstream performance on the test set:
| Metric | LinAlgZero-SFT | LinAlgZero-GRPO |
|---|---|---|
| Optimal Trajectory | 89.87% | 90.26% |
| Correctness | 91.86% | 92.63% |
| Format Validity | 96.15% | 96.66% |
| Tool Success | 100.00% | 100.00% |
Artifacts
| Artifact | Link |
|---|---|
| SFT checkpoint | atomwalk12/LinalgZero-SFT |
| GRPO checkpoint | atomwalk12/LinAlgZero-GRPO |
| Base dataset | atomwalk12/linalgzero |
| Distilled dataset (clean) | atomwalk12/linalgzero-distilled-clean |
| SFT dataset | atomwalk12/linalgzero-sft |
| GRPO dataset | atomwalk12/linalgzero-grpo |
Reproducibility
- Distillation: H100 80GB on Runpod with Qwen/Qwen3-32B-FP8; 15 hours at $2.39/hr (~$25).
- SFT: Local 24GB RTX 4090 with Qwen/Qwen2.5-3B.
- GRPO: RTX 6000 Ada on Runpod, improving on the SFT baseline; 57 hours at $0.77/hr (~$50).
- Total: ~$75 using a mix of cloud GPUs and local training.
Acknowledgements
- We base our distillation pipeline on distilabel.
- We base the RL experiment on ART.
- We use Qwen2.5 series base model Qwen2.5.
Citation
If you find this project is useful in your own work, please consider citing as follows:
@misc{linalg-zero,
title = {Linalg-Zero: Distilling Neurosymbolic Reasoning for Linear Algebra in Small Language Models},
url = {https://github.com/atomwalk12/linalg-zero},
author = {{Razvan F. Vasile}},
month = {March},
year = {2026}
}