OpenOneRec commited on
Commit
64c0baf
·
verified ·
1 Parent(s): 56e7a9d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +336 -1
README.md CHANGED
@@ -1,3 +1,338 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - recommendation
9
+ - generative-recommendation
10
+ - reasoning
11
+ - itemic-token
12
+ - qwen3
13
+ - pretraining
14
  ---
15
+
16
+ # OneReason
17
+
18
+ Reasoning Foundation Models for Generative Recommendation
19
+
20
+ [Paper](#citation) | [Model Zoo](#model-zoo) | [Quick Start](#quick-start) | [Citation](#citation)
21
+
22
+ <p align="center">
23
+ <img src="assert/main.png" alt="OneReason training and evaluation pipeline" width="90%">
24
+ </p>
25
+
26
+ <p align="center"><b>Figure 1:</b> The pre-training, SFT, RL, and reasoning-evaluation pipeline of OneReason.</p>
27
+
28
+ ## Introduction
29
+
30
+ OneReason is a recommendation foundation model that connects large language models with generative recommender systems. It represents items as compact **itemic tokens** and trains the model to align itemic-token semantics with natural language, user behavior, and recommendation-oriented reasoning traces.
31
+
32
+ The OneReason training stack contains three stages:
33
+
34
+ - **Pre-training:** builds itemic-token perception through four-granularity itemic-text alignment data, covering token-, item-, relational-, and user-level signals.
35
+ - **Supervised Fine-Tuning (SFT):** teaches recommendation cognition with coarse-to-fine Chain-of-Thought (CoT) traces over user profiles, behavior histories, and itemic-token evidence.
36
+ - **Reinforcement Learning (RL):** uses a specialize-then-unify recipe to improve thinking-mode recommendation while balancing performance across multiple recommendation domains.
37
+
38
+ This repository currently releases the **OneReason-0.8B Pretrain checkpoint**. We will continue to release OneReason-0.8B SFT/RL checkpoints and the OneReason-8B series.
39
+
40
+ ## News
41
+
42
+ - **[2026.06]** OneReason-0.8B Pretrain checkpoint is released.
43
+ - **Coming soon:** OneReason-0.8B SFT checkpoint.
44
+ - **Coming soon:** OneReason-0.8B RL checkpoint.
45
+ - **Coming soon:** OneReason-8B checkpoints.
46
+
47
+ ## Model Zoo
48
+
49
+ | Model | Stage | Parameters | Status | Description |
50
+ |---|---:|---:|---|---|
51
+ | OneReason-0.8B-Pretrain | Pre-training | 0.8B | Released | Foundation checkpoint after itemic-text alignment pre-training. Suitable for research, continued pre-training, and downstream SFT. |
52
+ | OneReason-0.8B-SFT | SFT | 0.8B | Coming soon | Instruction-tuned checkpoint with recommendation perception, derivation, evolution, and recommendation supervision. |
53
+ | OneReason-0.8B-RL | RL | 0.8B | Coming soon | Post-trained checkpoint optimized for recommendation-oriented reasoning. |
54
+ | OneReason-8B | Pretrain/SFT/RL | 8B | Coming soon | Larger OneReason model family with stronger reasoning and recommendation performance. |
55
+
56
+ ## Method Overview
57
+
58
+ ### Itemic Tokens
59
+
60
+ OneReason represents each item with one domain-aware begin token and three hierarchical sub-tokens:
61
+
62
+ ```text
63
+ <|domain_begin|><s_a_xxxx><s_b_xxxx><s_c_xxxx>
64
+ ```
65
+
66
+ Supported recommendation domains include:
67
+
68
+ | Domain | Begin token | Example |
69
+ |---|---|---|
70
+ | Short video | <code>&lt;&#124;video_begin&#124;&gt;</code> | <code>&lt;&#124;video_begin&#124;&gt;&lt;s_a_3334&gt;&lt;s_b_4643&gt;&lt;s_c_625&gt;</code> |
71
+ | E-commerce product | <code>&lt;&#124;prod_begin&#124;&gt;</code> | <code>&lt;&#124;prod_begin&#124;&gt;&lt;s_a_2147&gt;&lt;s_b_7978&gt;&lt;s_c_5031&gt;</code> |
72
+ | Advertisement | <code>&lt;&#124;ad_begin&#124;&gt;</code> | <code>&lt;&#124;ad_begin&#124;&gt;&lt;s_a_7939&gt;&lt;s_b_6234&gt;&lt;s_c_4978&gt;</code> |
73
+ | Live streaming | <code>&lt;&#124;living_begin&#124;&gt;</code> | <code>&lt;&#124;living_begin&#124;&gt;&lt;s_a_4515&gt;&lt;s_b_6234&gt;&lt;s_c_6278&gt;</code> |
74
+ | General multimodal item | <code>&lt;&#124;sid_begin&#124;&gt;</code> | <code>&lt;&#124;sid_begin&#124;&gt;&lt;s_a_340&gt;&lt;s_b_6566&gt;&lt;s_c_5603&gt;</code> |
75
+
76
+ Each itemic token sequence is produced by a three-layer codebook, where each layer contains 8192 codes. The released checkpoint can process these itemic-token strings through its tokenizer. Mapping raw items to itemic tokens, or mapping generated itemic tokens back to real item IDs, requires the corresponding itemic tokenizer and item catalog.
77
+
78
+ ### Pre-training Data Design
79
+
80
+ OneReason pre-training uses **578B tokens** to align itemic-token and text-token semantic spaces. The recommendation part follows a four-granularity corpus design:
81
+
82
+ - **Token granularity:** aligns individual and compositional sub-token semantics.
83
+ - **Item granularity:** aligns complete itemic patterns with natural-language captions and multi-perspective item QA.
84
+ - **Relational granularity:** injects item-to-item collaborative relations with natural-language transition explanations.
85
+ - **User granularity:** models user behavior sequences with domain-grouped and chronologically interleaved itemic-text formats.
86
+
87
+ General-domain text and multimodal corpora are mixed in to preserve instruction-following, reasoning, code, math, and broad semantic capabilities while injecting recommendation-specific knowledge.
88
+
89
+ ### Training Recipe
90
+
91
+ The pre-training recipe contains three stages:
92
+
93
+ | Stage | Trainable parameters | Token budget | Purpose |
94
+ |---|---|---:|---|
95
+ | Stage 1 | Extended vocabulary + LM head | 110B | Warm up newly introduced itemic-token embeddings. |
96
+ | Stage 2 | All parameters | 449B | Inject four-granularity recommendation knowledge. |
97
+ | Stage 3 | All parameters | 19B | Extend long-context user behavior modeling. |
98
+
99
+ ## OneReason-Bench
100
+
101
+ OneReason is evaluated with **OneReason-Bench**, a reasoning-oriented recommendation benchmark organized into four layers:
102
+
103
+ | Layer | Capability | Representative tasks |
104
+ |---|---|---|
105
+ | R0: Perception | Ground itemic tokens into semantic content. | Item understanding, itemic pattern grounding, item QA. |
106
+ | R1: Derivation | Reason over item-to-item relations. | Item2Item relation derivation. |
107
+ | R2: Evolution | Model user interests as temporal processes. | Evolution action selection, topic generation, direct evolution generation. |
108
+ | R3: Recommendation | Combine perception, derivation, and evolution for recommendation. | Single-domain and cross-domain recommendation. |
109
+
110
+ ## Performance
111
+
112
+ The released **OneReason-0.8B-Pretrain** checkpoint is the foundation checkpoint before SFT/RL. It is designed to provide strong itemic-token perception and a good initialization for downstream recommendation tuning.
113
+
114
+ The tables below report the full OneReason-8B system results from the technical report. We will update this model card with checkpoint-specific numbers as the OneReason-0.8B SFT/RL and OneReason-8B checkpoints become available.
115
+
116
+ <p align="center">
117
+ <img src="assert/rader.png" alt="OneReason performance overview and thinking-mode gains" width="95%">
118
+ </p>
119
+
120
+ <p align="center"><b>Figure 2:</b> Performance overview of OneReason-8B. The radar chart summarizes general, perception, derivation, evolution, and recommendation capabilities; the bar charts show thinking-mode gains and the effect of thinking-data supervision.</p>
121
+
122
+ ### Results on Cross-Domain Recommendation
123
+
124
+ Cross-domain recommendation results are reported in percentage. Best results are **bolded**; second-best results are <u>underlined</u>.
125
+
126
+ | Category | Model | C-Video Pass@64 | C-Video Recall@64 | C-Product Pass@64 | C-Product Recall@64 | C-Ad Pass@64 | C-Ad Recall@64 | C-Live Pass@64 | C-Live Recall@64 |
127
+ |---|---|---:|---:|---:|---:|---:|---:|---:|---:|
128
+ | ID-Based | SASRec | 0.03 | 0.01 | 0.31 | 0.25 | 1.04 | 0.37 | 1.76 | 0.40 |
129
+ | ID-Based | HSTU | 0.10 | 0.01 | 0.32 | 0.24 | 2.79 | 0.78 | 2.32 | 2.14 |
130
+ | Text-Based | Qwen3-8B | 0.05 | 0.01 | 0.15 | 0.12 | 0.48 | 0.09 | 2.10 | 1.85 |
131
+ | Text-Based | Qwen3-32B | 0.33 | 0.03 | 0.84 | 0.63 | 1.21 | 0.30 | 5.64 | 5.10 |
132
+ | Text-Based | Qwen3-235B-A22B | 0.24 | 0.02 | 0.64 | 0.49 | 0.77 | 0.19 | 5.10 | 4.66 |
133
+ | Text-Based | Deepseek-V3.2 | 0.11 | 0.01 | 0.38 | 0.31 | 0.62 | 0.13 | 3.46 | 3.12 |
134
+ | Text-Based | Claude-Opus-4.6 | 0.14 | 0.01 | 0.23 | 0.17 | 0.50 | 0.11 | 3.02 | 2.66 |
135
+ | Text-Based | Gemini-3-Preview | 0.29 | 0.03 | 0.74 | 0.59 | 1.22 | 0.27 | 3.92 | 3.44 |
136
+ | Text-Based | GPT-4o-mini | 0.19 | 0.02 | 0.73 | 0.55 | 1.21 | 0.28 | 4.01 | 3.57 |
137
+ | Text-Based | GPT-5.4 | 0.24 | 0.02 | 1.43 | 1.15 | 1.64 | 0.43 | 7.20 | 6.38 |
138
+ | Itemic Token-Based | TIGER | 0.88 | 0.07 | 0.21 | 0.17 | 7.65 | 2.39 | 2.32 | 1.78 |
139
+ | Itemic Token-Based | LC-Rec-SFT-Only-8B | 0.22 | 0.02 | 0.06 | 0.05 | 2.83 | 0.67 | 0.89 | 0.71 |
140
+ | Itemic Token-Based | LC-Rec-SFT-Only-14B | 0.20 | 0.01 | 1.03 | 0.73 | 5.99 | 1.94 | 3.76 | 3.09 |
141
+ | Itemic Token-Based | LC-Rec-PT-SFT-8B | 1.49 | 0.13 | 3.95 | 3.00 | 15.85 | 6.55 | 19.32 | 16.70 |
142
+ | Itemic Token-Based | OneReason SFT non-thinking | 1.33 | 0.11 | 3.94 | 2.96 | 15.73 | 6.49 | 18.05 | 15.52 |
143
+ | Itemic Token-Based | OneReason SFT thinking | 0.71 | 0.06 | 2.18 | 1.65 | 9.16 | 3.41 | 16.43 | 14.32 |
144
+ | Itemic Token-Based | OneReason RFT non-thinking | <u>2.08</u> | <u>0.19</u> | <u>5.20</u> | <u>3.96</u> | <u>17.56</u> | <u>7.26</u> | <u>21.01</u> | <u>18.17</u> |
145
+ | Itemic Token-Based | OneReason RFT thinking | **2.41** | **0.24** | **5.47** | **4.19** | **17.78** | **7.50** | **21.10** | **18.35** |
146
+
147
+ ### Results on R0-R2 Reasoning Tasks
148
+
149
+ R0-R2 results on OneReason-Bench are reported in percentage. For R0 tasks, results are macro-averaged over all domains. Grounding is reported by Pass@64.
150
+
151
+ | Category | Model | R0 Item Und. | R0 Ground. | R0 QA | R1 I2I | R2 Select. | R2 Topic Gen. | R2 Direct Gen. |
152
+ |---|---|---:|---:|---:|---:|---:|---:|---:|
153
+ | Text-Based | Qwen3-8B | - | - | - | - | 40.70 | 25.49 | 8.60 |
154
+ | Text-Based | Qwen3-32B | - | - | - | - | 51.96 | 28.05 | 7.73 |
155
+ | Text-Based | Deepseek-V3.2 | - | - | - | - | <u>57.18</u> | 27.13 | 11.32 |
156
+ | Text-Based | Claude-Opus-4.6 | - | - | - | - | 56.84 | 17.16 | 13.46 |
157
+ | Text-Based | Gemini-3-Preview | - | - | - | - | 56.83 | 33.68 | 14.76 |
158
+ | Text-Based | GPT-5.4 | - | - | - | - | **58.92** | **41.41** | 17.61 |
159
+ | Itemic Token-Based | LC-Rec-SFT-Only-8B | 22.98 | 0.00 | 0.40 | 3.43 | 0.00 | 0.00 | 0.00 |
160
+ | Itemic Token-Based | LC-Rec-SFT-Only-14B | 26.48 | 0.00 | 56.45 | 16.21 | 0.00 | 0.00 | 0.00 |
161
+ | Itemic Token-Based | LC-Rec-PT-SFT-8B | 35.41 | <u>5.21</u> | 63.90 | 25.54 | 3.32 | 8.60 | 4.46 |
162
+ | Itemic Token-Based | OneReason SFT non-thinking | <u>36.84</u> | 3.95 | <u>66.55</u> | <u>28.36</u> | 35.07 | 33.87 | 15.42 |
163
+ | Itemic Token-Based | OneReason SFT thinking | **36.91** | 1.06 | 64.60 | 23.88 | 32.18 | 31.60 | 14.31 |
164
+ | Itemic Token-Based | OneReason RFT non-thinking | 36.82 | **5.24** | **67.25** | 23.99 | 38.92 | 39.33 | <u>20.31</u> |
165
+ | Itemic Token-Based | OneReason RFT thinking | 36.78 | 1.35 | 65.65 | **28.60** | 42.42 | <u>39.57</u> | **21.23** |
166
+
167
+ ## Quick Start
168
+
169
+ Install dependencies:
170
+
171
+ ```bash
172
+ pip install "transformers>=4.51.0" accelerate safetensors torch
173
+ ```
174
+
175
+ Load the model:
176
+
177
+ ```python
178
+ from transformers import AutoModelForCausalLM, AutoTokenizer
179
+
180
+ model_name = "OpenOneRec/OneReason-0.8B-Pretrain" # or the local path to this repository
181
+
182
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
183
+ model = AutoModelForCausalLM.from_pretrained(
184
+ model_name,
185
+ torch_dtype="auto",
186
+ device_map="auto",
187
+ trust_remote_code=True,
188
+ )
189
+ ```
190
+
191
+ ### Item Understanding Example
192
+
193
+ ```python
194
+ prompt = "<|prod_begin|><s_a_1183><s_b_746><s_c_5290>,这个商品卖的是什么? /no_think"
195
+
196
+ messages = [{"role": "user", "content": prompt}]
197
+ text = tokenizer.apply_chat_template(
198
+ messages,
199
+ tokenize=False,
200
+ add_generation_prompt=True,
201
+ )
202
+
203
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
204
+ outputs = model.generate(
205
+ **inputs,
206
+ max_new_tokens=512,
207
+ do_sample=True,
208
+ top_p=0.95,
209
+ temperature=0.7,
210
+ )
211
+
212
+ response_ids = outputs[0][len(inputs.input_ids[0]):]
213
+ print(tokenizer.decode(response_ids, skip_special_tokens=True))
214
+ ```
215
+
216
+ Expected response:
217
+
218
+ ```
219
+ 这是一款便携式的可折叠蓝牙键盘,专为移动办公和旅行设计。该商品采用超薄设计,重量仅约200克,方便放入背包或手提包中。键盘支持蓝牙5.0连接,兼容iOS、Android和Windows系统,可同时连接最多3台设备,并支持一键切换。键盘按键采用剪刀脚结构,提供舒适的打字体验。内置锂电池,续航时间可达30天。适合需要经常出差或在不同设备间切换使用的用户。
220
+ ```
221
+
222
+ ### Itemic Token Grounding Example
223
+
224
+ ```python
225
+ prompt = (
226
+ "根据描述生成短视频token:一个关于超市货架上的方便面种类及消费者在选择时遇到的困惑的视频。"
227
+ "这段视频记录了一家超市的方便面货架,展示了各种不同口味的方便面。/no_think"
228
+ )
229
+
230
+ messages = [{"role": "user", "content": prompt}]
231
+ text = tokenizer.apply_chat_template(
232
+ messages,
233
+ tokenize=False,
234
+ add_generation_prompt=True,
235
+ )
236
+
237
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
238
+ outputs = model.generate(
239
+ **inputs,
240
+ max_new_tokens=64,
241
+ do_sample=False,
242
+ )
243
+
244
+ response_ids = outputs[0][len(inputs.input_ids[0]):]
245
+ print(tokenizer.decode(response_ids, skip_special_tokens=True))
246
+ ```
247
+
248
+ Expected response:
249
+
250
+ ```
251
+ <|video_begin|><s_a_5820><s_b_908><s_c_1352>
252
+ ```
253
+
254
+ ### Recommendation-Style Example
255
+
256
+ The following prompt format illustrates how user profiles and behavior histories can be used for recommendation. For production deployment, you should use your own itemic tokenizer, item catalog, user behavior schema, and candidate decoding strategy.
257
+
258
+ ```python
259
+ prompt = (
260
+ "参考以下用户信息:41-49 岁河北男性用户偏好短剧中的复仇商战题材,热衷于象棋、民族风情及民生资讯类短视频,"
261
+ "常关注憨豆等直播内容并倾向于消费休闲模拟经营游戏。"
262
+ "这个用户看过<|ad_begin|><s_a_7939><s_b_6234><s_c_4978>, "
263
+ "<|ad_begin|><s_a_5673><s_b_6234><s_c_1614>, "
264
+ "<|ad_begin|><s_a_3578><s_b_3009><s_c_3363>这些广告,"
265
+ "该用户最近可能感兴趣的视频有哪些? /no_think"
266
+ )
267
+
268
+ messages = [{"role": "user", "content": prompt}]
269
+ text = tokenizer.apply_chat_template(
270
+ messages,
271
+ tokenize=False,
272
+ add_generation_prompt=True,
273
+ )
274
+
275
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
276
+ outputs = model.generate(
277
+ **inputs,
278
+ max_new_tokens=256,
279
+ do_sample=True,
280
+ top_p=0.95,
281
+ temperature=0.7,
282
+ )
283
+
284
+ response_ids = outputs[0][len(inputs.input_ids[0]):]
285
+ print(tokenizer.decode(response_ids, skip_special_tokens=True))
286
+ ```
287
+
288
+ Expected response (example output):
289
+
290
+ ```
291
+ [
292
+ '<|video_begin|><s_a_7830><s_b_3006><s_c_6390>',
293
+ '<|video_begin|><s_a_5910><s_b_5272><s_c_6222>',
294
+ '<|video_begin|><s_a_5910><s_b_6379><s_c_4512>',
295
+ ...
296
+ '<|video_begin|><s_a_2015><s_b_6234><s_c_88>'
297
+ ]
298
+ ```
299
+
300
+ ## Intended Use
301
+
302
+ The OneReason-0.8B Pretrain checkpoint is intended for:
303
+
304
+ - Research on generative recommendation and recommendation foundation models.
305
+ - Continued pre-training or SFT on new recommendation domains.
306
+ - Itemic-token perception studies, including item understanding and itemic-token grounding.
307
+ - Building downstream recommendation models that combine user profiles, behavior histories, and itemic-token representations.
308
+
309
+ For best recommendation reasoning performance, we recommend using future SFT/RL checkpoints once released, or fine-tuning this pretrain checkpoint on task-specific supervised data.
310
+
311
+ ## Limitations
312
+
313
+ - This release is a **pre-training checkpoint**, not the final SFT/RL reasoning model.
314
+ - Direct recommendation quality depends on the itemic tokenizer, item catalog, user history format, and decoding strategy.
315
+ - Generated itemic tokens must be validated against the target item catalog before being used as item IDs.
316
+ - The model may generate invalid, stale, or unsupported itemic-token sequences if the prompt distribution differs significantly from training data.
317
+ - The checkpoint is released for research and should not be used for high-stakes personalization without careful evaluation, filtering, and privacy review.
318
+
319
+ ## Citation
320
+
321
+ If you find OneReason useful, please cite our technical report. The official BibTeX will be updated after the report is publicly available.
322
+
323
+ ```bibtex
324
+ @article{onereason2026,
325
+ title = {OneReason Technical Report},
326
+ author = {OneRec Team},
327
+ journal = {Technical Report},
328
+ year = {2026}
329
+ }
330
+ ```
331
+
332
+ ## License
333
+
334
+ Please refer to the license file in this repository for the terms governing the model weights and any accompanying code or assets.
335
+
336
+ ## Acknowledgements
337
+
338
+ OneReason builds on the open-source LLM and recommendation ecosystem. We thank the Qwen, OpenOneRec, PyTorch, Transformers, and distributed training communities for their foundational contributions.