library_name: transformers
tags:
- reasoning
- qwen
pipeline_tag: text-generation
Model Card for Variational Reasoning for Language Models
This model is from the paper Variational Reasoning for Language Models, which introduces a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. The framework extends the evidence lower bound (ELBO) to a multi-trace objective and proposes a forward-KL formulation to stabilize training. This principled probabilistic perspective unifies variational inference with RL-style methods, yielding stable objectives for improving the reasoning ability of language models. The approach has been empirically validated on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks.
Model Details
Model Description
This model card describes a framework and associated models for improving language model reasoning. The models are based on the Qwen 2.5 and Qwen 3 families. The provided GitHub repository includes data processing, training pipelines, and an evaluation suite, initialized from LLaMA-Factory and SkyThought.
- Developed by: Xiangxin Zhou, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, and Tianyu Pang.
- Model type: Qwen2ForCausalLM (Causal Language Model)
- Language(s) (NLP): English
- License: [More Information Needed]
- Finetuned from model: Qwen 2.5 and Qwen 3 model families (e.g., Qwen3-4B-Base, Qwen3-8B-Base, Qwen2.5-7B-Instruct, Qwen2.5-32B-Instruct)
Model Sources
- Repository: https://github.com/sail-sg/variational-reasoning
- Paper: https://huggingface.co/papers/2509.22637
Uses
Direct Use
The models developed within this framework are intended for improving the reasoning capabilities of language models. They can be used for tasks requiring structured thinking traces, such as complex problem-solving and fact-based question answering, typically falling under text generation or chat completion pipelines.
Out-of-Scope Use
While capable of general text generation, these models are specifically optimized for reasoning tasks. Their performance might not be optimal for highly creative writing, artistic text generation, or tasks not requiring explicit reasoning traces, compared to models specifically fine-tuned for such purposes.
Bias, Risks, and Limitations
Users should be aware that the models' performance is tied to the quality and biases present in their underlying base models (Qwen 2.5 and Qwen 3) and the datasets used for fine-tuning. Reasoning capabilities are also task-dependent.
How to Get Started with the Model
This model is designed to be compatible with the Hugging Face transformers library. For detailed instructions on environment setup, training the various components of the variational reasoning framework, and evaluation, please refer to the official GitHub repository.
Training Details
Training Data
The training process involves an initial reasoning model ($\pi_{\theta_0}$), a variational posterior ($q_\phi$), and a final reasoning model ($\pi_\theta$). The GitHub repository lists specific datasets used, such as Variational-Posterior-4B-Acc-mix and Variational-Posterior-PB-4B.
Training Procedure
The training procedures are detailed in the GitHub repository, utilizing LLaMA-Factory and SkyThought environments. It involves multiple steps, including:
- Training the initial reasoning model $\pi_{\theta_0}$.
- Training the variational posterior $q_\phi$.
- Sampling from the variational posterior $q_\phi$.
- Estimating log likelihoods using both the initial reasoning model and the variational posterior.
- Optionally sampling from the initial reasoning model and verifying.
- Building the dataset for training the final reasoning model $\pi_\theta$ using either a geometric mean of token likelihood or an accuracy-based estimator.
- Training the final reasoning model $\pi_\theta$. Scripts to reproduce experiments are provided in the repository, with instructions for adjusting hyperparameters based on compute resources.
Evaluation
Evaluation of the final reasoning models $\pi_\theta$ can be performed using the SkyThought framework. Detailed evaluation scripts and instructions are available in SkyThought/variational_reasoning/eval/eval.sh within the GitHub repository. Special considerations for vllm versions and tensor_parallel_size for larger models are also noted.
Citation
If you find this work useful, please consider citing the paper:
@article{zhou2025variationalreasoninglanguagemodels,
title={Variational Reasoning for Language Models},
author={Xiangxin Zhou and Zichen Liu and Haonan Wang and Chao Du and Min Lin and Chongxuan Li and Liang Wang and Tianyu Pang},
journal={arXiv preprint arXiv:2509.22637},
year={2025}
}