GAIR
/

LIMR

Safetensors

qwen2

Model card Files Files and versions

xet

Community

Add metadata, license, and a basic usage example

by nielsr HF Staff - opened Feb 18, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+29

-1

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
 <div align="center">
 # LIMR: Less is More for RL Scaling
@@ -56,6 +63,27 @@ Comparsion with other popular RL recipes. We apply RL directly from the base mod
 | SimpleRL  | Base       | No            | 8,523         |
 | LIMR      | Base       | No            | 1,389         |
 ## Acknowledgements
 Our work builds upon the insightful technical reports from [DeepSeek R1](https://github.com/deepseek-ai/DeepSeek-R1) and [Kimi-k1.5](https://github.com/MoonshotAI/Kimi-k1.5) teams. We extend our appreciation to the [Qwen-Math](https://github.com/QwenLM/Qwen2.5-Math) team for their open-source model, and to the creators of [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [vLLM](https://github.com/vllm-project/vllm) for providing the essential reinforcement learning framework and inference infrastructure, respectively, that enabled this research.
@@ -75,4 +103,4 @@ If you find this work useful, please cite our paper:
   howpublished = {\url{https://github.com/GAIR-NLP/LIMR}},
 }
 ```

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: question-answering
+---
+```markdown
 <div align="center">
 # LIMR: Less is More for RL Scaling
 | SimpleRL  | Base       | No            | 8,523         |
 | LIMR      | Base       | No            | 1,389         |
+Here's how you can use the model:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+torch.manual_seed(1234)
+tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMR", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("GAIR/LIMR", trust_remote_code=True, torch_dtype=torch.bfloat16)
+model = model.to("cuda")
+text = "What is 1+1? Answer:"
+inputs = tokenizer(text, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=20)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+#  What is 1+1? Answer: 2
+```
 ## Acknowledgements
 Our work builds upon the insightful technical reports from [DeepSeek R1](https://github.com/deepseek-ai/DeepSeek-R1) and [Kimi-k1.5](https://github.com/MoonshotAI/Kimi-k1.5) teams. We extend our appreciation to the [Qwen-Math](https://github.com/QwenLM/Qwen2.5-Math) team for their open-source model, and to the creators of [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [vLLM](https://github.com/vllm-project/vllm) for providing the essential reinforcement learning framework and inference infrastructure, respectively, that enabled this research.
   howpublished = {\url{https://github.com/GAIR-NLP/LIMR}},
 }
 ```
+```