Spaces:

atomwalk12
/

linalg-zero

Running on Zero

App Files Files Community

linalg-zero / README.md

atomwalk12

initial commit

0dd6c2f 6 days ago

preview code

raw

history blame contribute delete

6.76 kB

metadata

title: Linalg-Zero
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: 3.12.12
app_file: linalg_zero/demo/app.py
pinned: false

Linalg-Zero

Overview

Table of Contents

Overview
Main Phases
Installation
Quickstart
Results
Artifacts
Reproducibility
Acknowledgements

This repository offers tools for generating a linear algebra problem dataset and training an open-source base model (i.e. Qwen2.5-3B), aiming to explore planning and tool use using SFT and RL, distinct from Deepseek-R1's primary emphasis on reasoning.

The project is simple by design and mostly consists of:

linalg_zero/: contains the scripts to train models as well as generate synthetic data:
- generate.py: generates the linear algebra dataset and splits.
- distillation.py: runs the distillation pipeline to create multi-turn tool-use data.
- sft_train.py: performs a simple SFT of a model on a dataset.
- grpo_train.py: trains a model with GRPO on a given dataset.
Makefile: contains easy-to-run commands for the dataset and training workflows using previous scripts.

Main Phases

We use the DeepSeek-R1 tech report as a loose guide, but the project phases are:

Step 1: generate a linear algebra dataset with controlled difficulty and tool-call metadata.
Step 2: distill multi-turn tool-use data from a teacher model.
Step 3: SFT the base model on the dataset to teach the tool-calling format.
Step 4: GRPO fine-tune on the tool-use tasks, using a curriculum.

Installation

We use uv as the dependency management tool. First, to install uv, follow the UV Installation Guide.

To run the experiments install the dependencies using:

For generation/distillation: make install-data-gen
For SFT: make install-sft
For RL: make install-grpo

Next, log into your Hugging Face and Weights and Biases accounts as follows:

huggingface-cli login
wandb login

Quickstart

After installing dependencies above, run the commands below. For modifications, see the config files.

# Phase 1: Generate dataset
uv run python linalg_zero/generate.py --dataset_name atomwalk12/linalgzero --push_dataset

# Phase 2: Distillation (setup once)
cp linalg_zero/config/distillation/env.example.sh env.sh
# Edit env.sh to set HF_TOKEN and ARGILLA_API_KEY.
source env.sh

# Terminal A
uv run python linalg_zero/distillation/launch_server.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml

# Terminal B (new terminal; source env.sh again)
source env.sh
uv run python linalg_zero/distillation.py --config linalg_zero/config/distillation/vllm_qwen3_32b.yaml

# Phase 3: SFT
uv run python linalg_zero/sft_train.py --config linalg_zero/config/sft/qwen2.5-3B/lora.yaml

# Phase 4: GRPO
uv run python linalg_zero/grpo_train.py --config-name runpod.yaml

Training requires the dataset to follow the strict OpenAI tool-calling format (see this link). We provide scripts to prepare and validate the data accordingly:

linalg_zero/
- sft/scripts/prepare_dataset.py: prepares the SFT dataset.
- grpo/scripts/prepare_dataset.py: prepares and validates the GRPO dataset.

Results

We provide a recipe to encourage planning and tool-use capabilities in the Qwen2.5-3B model, starting from a pre-trained (not instruction-tuned) base model.

This yields models like Linalg-Zero-SFT and Linalg-Zero-GRPO, with the following downstream performance on the test set:

Metric	LinAlgZero-SFT	LinAlgZero-GRPO
Optimal Trajectory	89.87%	90.26%
Correctness	91.86%	92.63%
Format Validity	96.15%	96.66%
Tool Success	100.00%	100.00%

Artifacts

Artifact	Link
SFT checkpoint	atomwalk12/LinalgZero-SFT
GRPO checkpoint	atomwalk12/LinAlgZero-GRPO
Base dataset	atomwalk12/linalgzero
Distilled dataset (clean)	atomwalk12/linalgzero-distilled-clean
SFT dataset	atomwalk12/linalgzero-sft
GRPO dataset	atomwalk12/linalgzero-grpo

Reproducibility

Distillation: H100 80GB on Runpod with Qwen/Qwen3-32B-FP8; 15 hours at $2.39/hr (~$25).
SFT: Local 24GB RTX 4090 with Qwen/Qwen2.5-3B.
GRPO: RTX 6000 Ada on Runpod, improving on the SFT baseline; 57 hours at $0.77/hr (~$50).
Total: ~$75 using a mix of cloud GPUs and local training.

Acknowledgements

We base our distillation pipeline on distilabel.
We base the RL experiment on ART.
We use Qwen2.5 series base model Qwen2.5.

Citation

If you find this project is useful in your own work, please consider citing as follows:

@misc{linalg-zero,
    title = {Linalg-Zero: Distilling Neurosymbolic Reasoning for Linear Algebra in Small Language Models},
    url = {https://github.com/atomwalk12/linalg-zero},
    author = {{Razvan F. Vasile}},
    month = {March},
    year = {2026}
}