linalg-zero / README.md
atomwalk12's picture
initial commit
0dd6c2f
metadata
title: Linalg-Zero
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: 3.12.12
app_file: linalg_zero/demo/app.py
pinned: false

Release Build status License: MIT

Linalg-Zero

Overview

Table of Contents
  1. Overview
  2. Main Phases
  3. Installation
  4. Quickstart
  5. Results
  6. Artifacts
  7. Reproducibility
  8. Acknowledgements

This repository offers tools for generating a linear algebra problem dataset and training an open-source base model (i.e. Qwen2.5-3B), aiming to explore planning and tool use using SFT and RL, distinct from Deepseek-R1's primary emphasis on reasoning.

The project is simple by design and mostly consists of:

  • linalg_zero/: contains the scripts to train models as well as generate synthetic data:
    • generate.py: generates the linear algebra dataset and splits.
    • distillation.py: runs the distillation pipeline to create multi-turn tool-use data.
    • sft_train.py: performs a simple SFT of a model on a dataset.
    • grpo_train.py: trains a model with GRPO on a given dataset.
  • Makefile: contains easy-to-run commands for the dataset and training workflows using previous scripts.

Main Phases

We use the DeepSeek-R1 tech report as a loose guide, but the project phases are:

  • Step 1: generate a linear algebra dataset with controlled difficulty and tool-call metadata.
  • Step 2: distill multi-turn tool-use data from a teacher model.
  • Step 3: SFT the base model on the dataset to teach the tool-calling format.
  • Step 4: GRPO fine-tune on the tool-use tasks, using a curriculum.

Installation

We use uv as the dependency management tool. First, to install uv, follow the UV Installation Guide.

To run the experiments install the dependencies using:

  • For generation/distillation: make install-data-gen
  • For SFT: make install-sft
  • For RL: make install-grpo

Next, log into your Hugging Face and Weights and Biases accounts as follows:

huggingface-cli login
wandb login

Quickstart

After installing dependencies above, run the commands below. For modifications, see the config files.

# Phase 1: Generate dataset
uv run python linalg_zero/generate.py --dataset_name atomwalk12/linalgzero --push_dataset

# Phase 2: Distillation (setup once)
cp linalg_zero/config/distillation/env.example.sh env.sh
# Edit env.sh to set HF_TOKEN and ARGILLA_API_KEY.
source env.sh

# Terminal A
uv run python linalg_zero/distillation/launch_server.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml

# Terminal B (new terminal; source env.sh again)
source env.sh
uv run python linalg_zero/distillation.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml

# Phase 3: SFT
uv run python linalg_zero/sft_train.py --config linalg_zero/config/sft/qwen2.5-3B/lora.yaml

# Phase 4: GRPO
uv run python linalg_zero/grpo_train.py --config-name runpod.yaml

Training requires the dataset to follow the strict OpenAI tool-calling format (see this link). We provide scripts to prepare and validate the data accordingly:

  • linalg_zero/
    • sft/scripts/prepare_dataset.py: prepares the SFT dataset.
    • grpo/scripts/prepare_dataset.py: prepares and validates the GRPO dataset.

Results

We provide a recipe to encourage planning and tool-use capabilities in the Qwen2.5-3B model, starting from a pre-trained (not instruction-tuned) base model.

This yields models like Linalg-Zero-SFT and Linalg-Zero-GRPO, with the following downstream performance on the test set:

Metric LinAlgZero-SFT LinAlgZero-GRPO
Optimal Trajectory 89.87% 90.26%
Correctness 91.86% 92.63%
Format Validity 96.15% 96.66%
Tool Success 100.00% 100.00%

Artifacts

Artifact Link
SFT checkpoint atomwalk12/LinalgZero-SFT
GRPO checkpoint atomwalk12/LinAlgZero-GRPO
Base dataset atomwalk12/linalgzero
Distilled dataset (clean) atomwalk12/linalgzero-distilled-clean
SFT dataset atomwalk12/linalgzero-sft
GRPO dataset atomwalk12/linalgzero-grpo

Reproducibility

  • Distillation: H100 80GB on Runpod with Qwen/Qwen3-32B-FP8; 15 hours at $2.39/hr (~$25).
  • SFT: Local 24GB RTX 4090 with Qwen/Qwen2.5-3B.
  • GRPO: RTX 6000 Ada on Runpod, improving on the SFT baseline; 57 hours at $0.77/hr (~$50).
  • Total: ~$75 using a mix of cloud GPUs and local training.

Acknowledgements

  • We base our distillation pipeline on distilabel.
  • We base the RL experiment on ART.
  • We use Qwen2.5 series base model Qwen2.5.

Citation

If you find this project is useful in your own work, please consider citing as follows:

@misc{linalg-zero,
    title = {Linalg-Zero: Distilling Neurosymbolic Reasoning for Linear Algebra in Small Language Models},
    url = {https://github.com/atomwalk12/linalg-zero},
    author = {{Razvan F. Vasile}},
    month = {March},
    year = {2026}
}