Text Generation
Transformers
Safetensors
English
purbeshmitra commited on
Commit
af72c0f
·
verified ·
1 Parent(s): 57fa8a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -4,6 +4,23 @@ library_name: peft
4
  license: apache-2.0
5
  ---
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ## Usage
8
  ```python
9
  from peft import PeftModel
 
4
  license: apache-2.0
5
  ---
6
 
7
+ # MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
8
+
9
+ 🔗 Paper link: [Arxiv preprint](https://arxiv.org/abs/2507.02851)
10
+
11
+ 🔗 Link to the trained models: [Hugging Face collection](https://huggingface.co/collections/purbeshmitra/motif-paper-models-686a2f36407bb88f750eef75)
12
+
13
+ The [INFTYTHINK architecture](https://arxiv.org/abs/2503.06692v1), shown below, allows multi-round thinking for extended LLM reasoning beyond its context size.
14
+ <p align="center">
15
+ <img src="assets/multiround.png" alt="Alt Text" width="750">
16
+ </p>
17
+
18
+ In this work, we propose a GRPO based training method for such a system that allows to calculate the accuracy reward by rolling out trajectories and applying the reward at the first round of inference outcomes. This is depicted as following:
19
+ <p align="center">
20
+ <img src="assets/multiround_grpo.png" alt="Alt Text" width="750">
21
+ </p>
22
+
23
+
24
  ## Usage
25
  ```python
26
  from peft import PeftModel