linalg-zero / README.md
atomwalk12's picture
initial commit
0dd6c2f
---
title: Linalg-Zero
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "6.9.0"
python_version: "3.12.12"
app_file: linalg_zero/demo/app.py
pinned: false
---
[![Release](https://img.shields.io/github/v/release/atomwalk12/linalg-zero)](https://img.shields.io/github/v/release/atomwalk12/linalg-zero)
[![Build status](https://img.shields.io/github/actions/workflow/status/atomwalk12/linalg-zero/main.yml?branch=main)](https://github.com/atomwalk12/linalg-zero/actions/workflows/main.yml?query=branch%3Amain)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
# Linalg-Zero
## Overview
<details>
<summary>Table of Contents</summary>
<ol>
<li><a href="#overview">Overview</a></li>
<li><a href="#main-phases">Main Phases</a></li>
<li><a href="#installation">Installation</a></li>
<li><a href="#quickstart">Quickstart</a></li>
<li><a href="#results">Results</a></li>
<li><a href="#artifacts">Artifacts</a></li>
<li><a href="#reproducibility">Reproducibility</a></li>
<li><a href="#acknowledgements">Acknowledgements</a></li>
</ol>
</details>
This repository offers tools for generating a linear algebra problem dataset and training an open-source base model (i.e. Qwen2.5-3B), aiming to explore planning and tool use using SFT and RL, distinct from Deepseek-R1's primary emphasis on reasoning.
The project is simple by design and mostly consists of:
- `linalg_zero/`: contains the scripts to train models as well as generate synthetic data:
- `generate.py`: generates the linear algebra dataset and splits.
- `distillation.py`: runs the distillation pipeline to create multi-turn tool-use data.
- `sft_train.py`: performs a simple SFT of a model on a dataset.
- `grpo_train.py`: trains a model with GRPO on a given dataset.
- `Makefile`: contains easy-to-run commands for the dataset and training workflows using previous scripts.
## Main Phases
We use the DeepSeek-R1 [tech report](https://github.com/deepseek-ai/DeepSeek-R1) as a loose guide, but the project phases are:
* Step 1: generate a linear algebra dataset with controlled difficulty and tool-call metadata.
* Step 2: distill multi-turn tool-use data from a teacher model.
* Step 3: SFT the base model on the dataset to teach the tool-calling format.
* Step 4: GRPO fine-tune on the tool-use tasks, using a curriculum.
## Installation
We use `uv` as the dependency management tool.
First, to install `uv`, follow the [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/).
To run the experiments install the dependencies using:
* For generation/distillation: `make install-data-gen`
* For SFT: `make install-sft`
* For RL: `make install-grpo`
Next, log into your Hugging Face and Weights and Biases accounts as follows:
```shell
huggingface-cli login
wandb login
```
## Quickstart
After installing dependencies above, run the commands below. For modifications, see the config files.
```shell
# Phase 1: Generate dataset
uv run python linalg_zero/generate.py --dataset_name atomwalk12/linalgzero --push_dataset
# Phase 2: Distillation (setup once)
cp linalg_zero/config/distillation/env.example.sh env.sh
# Edit env.sh to set HF_TOKEN and ARGILLA_API_KEY.
source env.sh
# Terminal A
uv run python linalg_zero/distillation/launch_server.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml
# Terminal B (new terminal; source env.sh again)
source env.sh
uv run python linalg_zero/distillation.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml
# Phase 3: SFT
uv run python linalg_zero/sft_train.py --config linalg_zero/config/sft/qwen2.5-3B/lora.yaml
# Phase 4: GRPO
uv run python linalg_zero/grpo_train.py --config-name runpod.yaml
```
Training requires the dataset to follow the strict OpenAI tool-calling format (see [this link](https://huggingface.co/docs/trl/en/dataset_formats#tool-calling)). We provide scripts to prepare and validate the data accordingly:
- `linalg_zero/`
- `sft/scripts/prepare_dataset.py`: prepares the SFT dataset.
- `grpo/scripts/prepare_dataset.py`: prepares and validates the GRPO dataset.
## Results
We provide a recipe to encourage planning and tool-use capabilities in the [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model, starting from a pre-trained (not instruction-tuned) base model.
This yields models like [Linalg-Zero-SFT](https://huggingface.co/atomwalk12/LinalgZero-SFT) and [Linalg-Zero-GRPO](https://huggingface.co/atomwalk12/LinalgZero-GRPO), with the following downstream performance on the test set:
| Metric | LinAlgZero-SFT | LinAlgZero-GRPO |
|--------------------|----------------|-----------------|
| Optimal Trajectory | 89.87% | 90.26% |
| Correctness | 91.86% | 92.63% |
| Format Validity | 96.15% | 96.66% |
| Tool Success | 100.00% | 100.00% |
### Artifacts
| Artifact | Link |
|---|---|
| SFT checkpoint | [atomwalk12/LinalgZero-SFT](https://huggingface.co/atomwalk12/LinalgZero-SFT) |
| GRPO checkpoint | [atomwalk12/LinAlgZero-GRPO](https://huggingface.co/atomwalk12/LinAlgZero-GRPO) |
| Base dataset | [atomwalk12/linalgzero](https://huggingface.co/datasets/atomwalk12/linalgzero) |
| Distilled dataset (clean) | [atomwalk12/linalgzero-distilled-clean](https://huggingface.co/datasets/atomwalk12/linalgzero-distilled-clean) |
| SFT dataset | [atomwalk12/linalgzero-sft](https://huggingface.co/datasets/atomwalk12/linalgzero-sft) |
| GRPO dataset | [atomwalk12/linalgzero-grpo](https://huggingface.co/datasets/atomwalk12/linalgzero-grpo) |
## Reproducibility
- **Distillation:** H100 80GB on [Runpod](https://www.runpod.io/) with [Qwen/Qwen3-32B-FP8](https://huggingface.co/Qwen/Qwen3-32B-FP8); 15 hours at $2.39/hr (~$25).
- **SFT:** Local 24GB RTX 4090 with [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B).
- **GRPO:** RTX 6000 Ada on [Runpod](https://www.runpod.io/), improving on the SFT baseline; 57 hours at $0.77/hr (~$50).
- **Total:** ~$75 using a mix of cloud GPUs and local training.
## Acknowledgements
- We base our distillation pipeline on [distilabel](https://github.com/argilla-io/distilabel).
- We base the RL experiment on [ART](https://deepwiki.com/OpenPipe/ART).
- We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).
## Citation
If you find this project is useful in your own work, please consider citing as follows:
```bibtex
@misc{linalg-zero,
title = {Linalg-Zero: Distilling Neurosymbolic Reasoning for Linear Algebra in Small Language Models},
url = {https://github.com/atomwalk12/linalg-zero},
author = {{Razvan F. Vasile}},
month = {March},
year = {2026}
}
```