Update README.md

3636327 verified 5 months ago

5.44 kB

	---
	base_model: qwen3-14b
	datasets:
	- math
	- reasoning
	language: en
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- text-generation
	- math-reasoning
	- transferability
	- RL-GRPO
	- research-paper
	- qwen
	arxiv: 2507.00432
	library_name: transformers
	---

	# UniReason-Qwen3-14B-RL

	This model is associated with the research paper:
	"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"

	📄 Paper: [2507.00432](https://arxiv.org/abs/2507.00432)
	💻 Code: [https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning](https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning)

	## Abstract

	Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?

	## Model Description

	This model is a RL-GRPO-tuned version of qwen3-14b focused on math-reasoning capabilities.
	The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.

	### Key Research Questions Addressed:
	- Does math reasoning training improve general LLM capabilities?
	- How do different training methods (RL vs SFT) affect transferability?
	- What is the trade-off between specialized math performance and general capabilities?

	## Model Details

	- Base Model: qwen3-14b
	- Training Method: RL-GRPO
	- Primary Focus: math-reasoning
	- Training Data: Math-specific datasets
	- Architecture: Transformer-based language model
	- Parameters: 14B

	## Training Details

	### Training Method: RL-GRPO
	Custom training methodology - see paper for details.

	### Datasets Used
	- Mathematical reasoning datasets
	- See paper for complete dataset list

	## Performance

	### Math Reasoning Benchmarks
	- MATH: See paper
	- AIME: See paper

	### General Capabilities
	- General QA: See paper
	- Code Generation: See paper
	- Instruction Following: See paper

	For detailed performance metrics, please refer to the paper.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Example: Math reasoning
	math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
	inputs = tokenizer(math_prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=512, temperature=0.7)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)

	# Example: General reasoning
	general_prompt = "Explain the concept of supply and demand in economics."
	inputs = tokenizer(general_prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=512, temperature=0.7)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Limitations and Biases

	- Specialization Trade-offs: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
	- Training Method Dependencies: Performance characteristics vary significantly between RL and SFT training approaches
	- Domain Transfer: The extent of capability transfer from math to other domains is limited
	- Computational Requirements: Model requires significant computational resources for inference

	## Research Findings

	Key findings from the associated paper:
	1. RL vs SFT: RL-tuned models show better transfer to general domains compared to SFT-tuned models
	2. Capability Trade-offs: Most math-specialized models fail to transfer gains to other domains
	3. Forgetting: SFT-tuned models often forget general capabilities during math-focused training

	## Ethical Considerations

	- This model is intended for research purposes
	- Users should be aware of potential biases in mathematical and general reasoning
	- The model should not be used for making critical decisions without human oversight
	- Consider the environmental impact of large model inference

	## Citation

	If you use this model in your research, please cite both the model and the associated paper:

	```bibtex
	@misc{huan2025doesmathreasoningimprove,
	title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning},
	author={Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue},
	year={2025},
	eprint={2507.00432},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2507.00432},
	}
	```

	## Contact

	For questions about this model or the associated research, please:
	- Open an issue in this repository
	- Contact the paper authors
	- Reference the original paper: https://arxiv.org/abs/2507.00432

	## Acknowledgments

	This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.

	---

	Model uploaded on 2025-07-03