Add metadata, license, and a basic usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -1,3 +1,10 @@
 
 
 
 
 
 
 
1
  <div align="center">
2
 
3
  # LIMR: Less is More for RL Scaling
@@ -56,6 +63,27 @@ Comparsion with other popular RL recipes. We apply RL directly from the base mod
56
  | SimpleRL | Base | No | 8,523 |
57
  | LIMR | Base | No | 1,389 |
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## Acknowledgements
60
 
61
  Our work builds upon the insightful technical reports from [DeepSeek R1](https://github.com/deepseek-ai/DeepSeek-R1) and [Kimi-k1.5](https://github.com/MoonshotAI/Kimi-k1.5) teams. We extend our appreciation to the [Qwen-Math](https://github.com/QwenLM/Qwen2.5-Math) team for their open-source model, and to the creators of [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [vLLM](https://github.com/vllm-project/vllm) for providing the essential reinforcement learning framework and inference infrastructure, respectively, that enabled this research.
@@ -75,4 +103,4 @@ If you find this work useful, please cite our paper:
75
  howpublished = {\url{https://github.com/GAIR-NLP/LIMR}},
76
  }
77
  ```
78
-
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: question-answering
5
+ ---
6
+
7
+ ```markdown
8
  <div align="center">
9
 
10
  # LIMR: Less is More for RL Scaling
 
63
  | SimpleRL | Base | No | 8,523 |
64
  | LIMR | Base | No | 1,389 |
65
 
66
+ Here's how you can use the model:
67
+
68
+ ```python
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+ import torch
71
+
72
+ torch.manual_seed(1234)
73
+
74
+ tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMR", trust_remote_code=True)
75
+ model = AutoModelForCausalLM.from_pretrained("GAIR/LIMR", trust_remote_code=True, torch_dtype=torch.bfloat16)
76
+
77
+ model = model.to("cuda")
78
+
79
+ text = "What is 1+1? Answer:"
80
+ inputs = tokenizer(text, return_tensors="pt").to("cuda")
81
+
82
+ outputs = model.generate(**inputs, max_new_tokens=20)
83
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
84
+ # What is 1+1? Answer: 2
85
+ ```
86
+
87
  ## Acknowledgements
88
 
89
  Our work builds upon the insightful technical reports from [DeepSeek R1](https://github.com/deepseek-ai/DeepSeek-R1) and [Kimi-k1.5](https://github.com/MoonshotAI/Kimi-k1.5) teams. We extend our appreciation to the [Qwen-Math](https://github.com/QwenLM/Qwen2.5-Math) team for their open-source model, and to the creators of [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [vLLM](https://github.com/vllm-project/vllm) for providing the essential reinforcement learning framework and inference infrastructure, respectively, that enabled this research.
 
103
  howpublished = {\url{https://github.com/GAIR-NLP/LIMR}},
104
  }
105
  ```
106
+ ```