zhouxiangxin
/

Variational-Reasoning-8B-GML

@@ -1,58 +1,46 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 [More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
 ## Bias, Risks, and Limitations
@@ -69,9 +57,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
@@ -79,12 +65,14 @@ Use the code below to get started with the model.
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
 [More Information Needed]
@@ -92,7 +80,7 @@ Use the code below to get started with the model.
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
@@ -102,10 +90,10 @@ Use the code below to get started with the model.
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
@@ -126,7 +114,7 @@ Use the code below to get started with the model.
 ### Results
-[More Information Needed]
 #### Summary
@@ -144,11 +132,11 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
@@ -168,17 +156,18 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 [More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
 ## Glossary [optional]

 ---
 library_name: transformers
+tags:
+- reasoning
+- qwen
+license: apache-2.0
+pipeline_tag: text-generation
 ---
+# Model Card for Variational Reasoning for Language Models
+This repository contains models for **Variational Reasoning for Language Models**, as presented in the paper [Variational Reasoning for Language Models](https://huggingface.co/papers/2509.22637).
+We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. This work extends the evidence lower bound (ELBO) to a multi-trace objective for tighter bounds and proposes a forward-KL formulation that stabilizes the training of the variational posterior. It further shows that rejection sampling finetuning and binary-reward RL can be interpreted as local forward-KL objectives. Empirically validated on Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks, this work provides a principled probabilistic perspective unifying variational inference with RL-style methods for improving reasoning ability.
 ## Model Details
 ### Model Description
+The models in this repository are designed to enhance the reasoning capabilities of Language Models through a novel variational inference framework. They are built upon the Qwen 2.5 and Qwen 3 model families. Examples include `Variational-Reasoning-4B-Acc` and `Variational-Reasoning-8B-Acc`, which leverage Qwen3-4B-Base or Qwen3-8B-Base as backbones.
+-   **Developed by:** Xiangxin Zhou, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, and Tianyu Pang.
+-   **Model type:** Causal Language Model (`Qwen3ForCausalLM`).
+-   **Language(s) (NLP):** English
+-   **License:** Apache 2.0
+-   **Finetuned from model:** Qwen 2.5 and Qwen 3 model families (e.g., Qwen3-4B-Base, Qwen2.5-7B-Instruct).
+### Model Sources
+-   **Repository:** [https://github.com/sail-sg/variational-reasoning](https://github.com/sail-sg/variational-reasoning)
+-   **Paper:** [https://huggingface.co/papers/2509.22637](https://huggingface.co/papers/2509.22637)
 ## Uses
 ### Direct Use
+These models are intended to be used for advanced text generation tasks that require strong reasoning abilities. They can be integrated into various analytical or conversational AI scenarios to generate thoughtful and coherent responses.
 ### Downstream Use [optional]
 [More Information Needed]
 ### Out-of-Scope Use
 [More Information Needed]
 ## Bias, Risks, and Limitations
 ## How to Get Started with the Model
+For detailed instructions on setting up environments, training, and evaluation, please refer to the [official GitHub repository](https://github.com/sail-sg/variational-reasoning).
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The models are trained using various mixed datasets, such as `Variational-Posterior-4B-Acc-mix` and `Variational-Posterior-4B-GML-mix`, which are linked from the [GitHub repository](https://github.com/sail-sg/variational-reasoning).
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The training procedure involves multiple steps: training an initial reasoning model ($\pi_{\theta_0}$), training a variational posterior ($q_\phi$), sampling from the posterior, estimating log likelihoods, and finally training the reasoning model ($\pi_\theta$) using accuracy-based or geometric mean likelihood estimators. Detailed scripts and configurations can be found in the [LLaMA-Factory subdirectory of the GitHub repository](https://github.com/sail-sg/variational-reasoning).
 #### Preprocessing [optional]
 [More Information Needed]
 #### Training Hyperparameters
+-   **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 ## Evaluation
 ### Testing Data, Factors & Metrics
+Detailed evaluation instructions and scripts can be found in the `SkyThought/variational_reasoning/eval/eval.sh` within the [GitHub repository](https://github.com/sail-sg/variational-reasoning).
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
 ### Results
+Quantitative results and analysis are provided in the [paper](https://huggingface.co/papers/2509.22637).
 #### Summary
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+-   **Hardware Type:** [More Information Needed]
+-   **Hours used:** [More Information Needed]
+-   **Cloud Provider:** [More Information Needed]
+-   **Compute Region:** [More Information Needed]
+-   **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 [More Information Needed]
+## Citation
+If you find this work useful, please consider citing our paper:
+```bib
+@article{zhou2025variationalreasoninglanguagemodels,
+      title={Variational Reasoning for Language Models},
+      author={Xiangxin Zhou and Zichen Liu and Haonan Wang and Chao Du and Min Lin and Chongxuan Li and Liang Wang and Tianyu Pang},
+      journal={arXiv preprint arXiv:2509.22637},
+      year={2025}
+}
+```
 ## Glossary [optional]