STORM-Qwen3-4B: Unlocking Native Reasoning for Optimization Modeling

STORM (Smart Thinking Optimization Reasoning Model) is an advanced 4B-parameter language model specialized for automating Operations Research (OR) and optimization modeling tasks.

This model card is for STORM-Qwen3-4B. For full details on our training methodology, evaluation, and the official inference code, please visit our GitHub Repository.

📖 Model Description

STORM is designed to translate natural language problem descriptions into mathematical models and executable solver code. It engages in an iterative, multi-step reasoning process that involves generating both natural language explanations and Python code snippets, which are then executed in a code interpreter.

This interactive process is enabled by our CALM (Corrective Adaptation with Lightweight Modification) framework, which guides the model to reason like a human expert. STORM is the result of a two-stage training pipeline (SFT + RL) built upon this framework.

✨ Core Capabilities

🚀 State-of-the-Art Performance: With only 4B parameters, STORM achieves an average accuracy of 68.9% on five major optimization benchmarks, matching the performance of models up to 671B parameters.
🧠 Enhanced Native Reasoning: The model excels at multi-step, iterative reasoning, closely mimicking a human expert's workflow.
🛠️ Powerful Code-Integrated Reasoning: STORM autonomously leverages scientific computing libraries like pulp, sympy, and numpy by generating code for an interpreter to execute.
💡 Emergent Tool Use: After RL, STORM can use tools it wasn't explicitly trained on (e.g., rdkit for chemistry) to tackle novel tasks.

🚀 How to Use

Important: STORM relies on a multi-turn reasoning process with a code interpreter. A standard model.generate() call is not sufficient to replicate the intended behavior.

To run this model correctly, you must use the inference framework provided in our official GitHub repository.

Step 1: Clone the Repository and Install Dependencies

First, clone our repository and set up the required environment. The repository contains the logic for managing the conversation between the model and the code interpreter.

# Clone the official STORM repository
git clone https://github.com/tangzhy/STORM.git
cd STORM

# Create a conda environment and install dependencies
conda create -n storm python=3.10
conda activate storm

# Install inference engine (vLLM recommended) and core packages
pip install "vllm>=0.8.5.post1"
pip install math_verify transformers datasets pebble

# Install scientific computing libraries
pip install pulp gurobipy cvxpy pyomo numpy scipy sympy pandas ... # etc.

For a full list of dependencies, please see the README.md in the GitHub repository.

Step 2: Prepare the Prompt

STORM performs best when the problem is framed within its specific instruction template. This is a critical step.

Prompt Template:

Given a mathematical problem, follow the instructions below to solve it.

### Instructions:

When solving mathematical problems, you should leverage both natural language reasoning and Python code execution. Your goal is to provide clear, detailed explanations while utilizing Python to perform complex calculations. Follow these guidelines to ensure a coherent and effective response:

1.  **Natural Language Reasoning:**
    -   Provide comprehensive, step-by-step explanations of your thought process.
    -   Formulate your plan BEFORE writing code. Explain what you are about to do and why.

2.  **Code Execution Rules:**
    -   **Purpose:** Each Python code block must be a complete, self-contained script that executes a single, logical step of your plan.
    -   **Output:** The SOLE mechanism for displaying results is the `print()` function. The purpose of a code block is to compute a value or set of values and explicitly `print()` them for the subsequent `output` block.
    -   **Structure:** Each block must contain all necessary imports and setups. The code must be directly executable. Avoid any boilerplate like `if __name__ == '__main__':`.

3.  **Recommended Toolkit & Best Practices:**
    -   To ensure reliability and environment compatibility, **you must prioritize using the following libraries** for their respective tasks.
    -   For **symbolic mathematics**: use `sympy`.
    -   For **numerical operations**: use `numpy`.
    -   For **scientific computing**: use `scipy`.
    -   For **optimization problems**: use `pulp`.

4.  **Solution Verification and Final Answer:**    
    A. **Code Output for Verification:** To ensure your reasoning is transparent and verifiable, your **final code block** should print all key results needed for the solution. For optimization problems, this typically includes:
        *   The optimal objective function value.
        *   The values of the main decision variables.

    C. **Final Answer Formulation:**
        *   **Full Solution Description:** Briefly summarize your findings, referencing the key values printed by your code.
        *   **Final Answer Boxing:** The final step is to put the **single numerical answer** to the main question inside `\\boxed{}`.
            - **Content:** The box should contain **only the number**, without any units, currency signs, or explanatory text.
            - **Example (Correct):** `\\boxed{1234}` or `\\boxed{1234.37}`
            - **Example (Incorrect):** `\\boxed{Total cost is $1234.0}`

### Problem:
{problem_text}

You must assemble the full prompt by inserting your problem description into this template before proceeding.

Step 3: Create the Input File

Create a .jsonl file (e.g., data/my_problems.jsonl). Each line must be a JSON object containing a unique prompt and a gt_answer field. The prompt field must contain the full, assembled prompt from Step 2.

Example data/my_problems.jsonl:

{"prompt": "Given a mathematical problem, follow the instructions below to solve it.\n\n### Instructions:\n...\n\n### Problem:\nI need to transport 25 tons of product...", "gt_answer": "1234"}

Step 4: Run Inference

Execute the inference script directly from the command line. This gives you full control over all parameters.

Here is a complete example. Make sure you are in the root directory of the cloned repository.

# 1. Set up your environment variables
export MODEL_NAME_OR_PATH="/path/to/your/downloaded/STORM-Qwen3-4B"
export INPUT_FILE="data/my_problems.jsonl"
export MODEL_OUTPUT_DIR="outputs/my_problems_results"
export GPU_ID=0

# 2. Run the inference command
CUDA_VISIBLE_DEVICES=$GPU_ID TOKENIZERS_PARALLELISM=false python -m infer.inference_and_eval \
    --input_file $INPUT_FILE \
    --output_dir $MODEL_OUTPUT_DIR \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --engine "vllm" \
    --tensor_parallel_size 1

The results, including the full reasoning trace for each problem, will be saved in the directory specified by $MODEL_OUTPUT_DIR.

📜 Citation

If you find our work helpful, please cite our paper:

@misc{tang2025calmstormunlockingnative,
      title={CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling}, 
      author={Zhengyang Tang and Zihan Ye and Chenyu Huang and Xuhan Huang and Chengpeng Li and Sihang Li and Guanhua Chen and Ming Yan and Zizhuo Wang and Hongyuan Zha and Dayiheng Liu and Benyou Wang},
      year={2025},
      eprint={2510.04204},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.04204}, 
}

Downloads last month: 5

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tangzhy/STORM-Qwen3-4B

Quantizations

1 model

Paper for tangzhy/STORM-Qwen3-4B

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Paper • 2510.04204 • Published Oct 5, 2025 • 21