File size: 6,272 Bytes
0c43774
 
 
 
 
 
 
 
 
 
 
 
 
c4c39d4
0c43774
 
 
 
 
 
 
 
 
2ff4c63
0c43774
c4c39d4
0c43774
 
 
 
 
 
 
7753d3d
0c43774
 
 
 
 
 
 
 
 
 
 
 
daba4e9
 
0c43774
 
3359596
0c43774
 
 
 
 
c2aa8d4
0c43774
 
 
 
 
c2aa8d4
0c43774
daba4e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c43774
 
daba4e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c43774
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ac986e
 
 
 
 
b16afca
c0c5715
b16afca
 
 
 
c0c5715
b16afca
 
 
 
 
 
c0c5715
b16afca
 
 
 
c0c5715
b16afca
 
 
7ac986e
 
b16afca
c0c5715
b16afca
 
 
 
c0c5715
b16afca
 
 
7ac986e
b16afca
7ac986e
b16afca
c0c5715
b16afca
 
 
7ac986e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- BlinkDL/rwkv-7-world
pipeline_tag: text-generation
library_name: transformers
---

<div align="center">
  <img src="https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1/resolve/main/figures/banner-1.png" style="border-radius: 10px; width: 100%; height: 100%; object-fit: cover;  box-shadow: 10px 10px 20px rgba(0, 0, 0, 0.5); border: 2px solid white;" alt="ARWKV" />
</div>


  <h1 align="center">ARWKV🪿</h1>

<p align="center">
  <a href="https://arxiv.org/abs/2501.15570"><b>Paper Link</b>👁️</a>  |  <a href="https://github.com/yynil/RWKVInside"><b>Github</b></a>
</p>

# ARWKV-R1-1B5 (Preview 0.1)

<img src="https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1/resolve/main/figures/architecture.png" alt="ARWKV Hybrid Architecture"  width="30%">

*Preview version with **RWKV-7** time mixing and Transformer MLP*

## 📌 Overview

**ALL YOU NEED IS RWKV**

This is an **early preview** of our 7B parameter RNN-based model, trained on 2k context length **(only stage-2 applied, without SFT or DPO)** through 3-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B. While being a foundational version, it demonstrates:

- ✅ RWKV-7's efficient recurrence mechanism
- ✅ No self-attention, fully O(n)
- ✅ Constant VRAM usage
- ✅ Single-GPU trainability

**Roadmap Notice**: We will soon open-source different enhanced versions with:
- 🚀 16k+ context capability
- 🧮 Math-specific improvements
- 📚 RL enhanced reasoning model

## How to use

```bash
pip3 install --upgrade rwkv-fla transformers
```
Before training: `export WKV_MODE=chunk`
```python
from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5"
)

system_prompt = "You are a world class trivia AI - provide accurate, succinct responses. "
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text = text + "<think>"
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

streamer = TextIteratorStreamer(tokenizer, skip_prompt=False, skip_special_tokens=False)


generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=8192, do_sample=True,tokenizer=tokenizer,stop_strings=["<|end▁of▁sentence|>"])
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Streaming output:")
for new_text in streamer:
    print(new_text, end="", flush=True)

thread.join()
```

The output looks like :
```bash
<|begin▁of▁sentence|>You are a world class trivia AI - provide accurate, succinct responses. <|User|>The world's largest rainforest, home to approximately three million species of plants and animals, is named after which river?<|Assistant|><think>
Okay, so I'm trying to solve this question about the world's largest rainforest and which river it's named after. Hmm, first, I think rainforest names often have links related to the region it's in. The most famous rainforest in the world is the Amazon. I remember hearing a lot about it being called that because rainforests are connected to specific river systems. 

Now, I'm trying to recall which river is named after the Amazon. I think it's the Amazon River. But I want to be sure. Let me see... the Amazon is a major rainforest located in South America. The Amazon River flows through it, which is why it's named after it. That makes sense because it's a very important river. I recall reading somewhere that all the rainforests are named after rivers related to their regions. So if the Amazon is named after its River, then the name would naturally be related to its source.

I wonder if it's the Amazon itself that's named after it, or another river named after it. But the official name for the Amazon is the Amazon Rainforest. The most significant rainforest in the world is the Amazon, and its name probably started with river-sounding names.
</think>

The largest rainforest located in South America is the Amazon. It is named after the river named after it, which is the Amazon River. Therefore, the Amazon River is the name given to the Amazon Rain Forest.
```



## 🔑 Key Features
| Component | Specification | Note |
|-----------|---------------|------|
| Architecture | RWKV-7 TimeMix + SwiGLU | Hybrid design |
| Context Window | 2048 training CTX | *Preview limitation* |
| Training Tokens | 40M | Distillation-focused |
| Precision | FP16 inference recommended(16G Vram required) | 15%↑ vs BF16 |

## 🏗️ Architecture Highlights
### Core Modification Flow
```diff
Transformer Decoder Layer:
- Multi-head Latent Attention(MLA)
+ RWKV-7 Time Mixing (Eq.3)
- RoPE Positional Encoding
+ State Recurrence
= Hybrid Layer Output
```

## Use case
<table>
  <tr>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/Chemical_equation.png" target="_blank">
      <img src="img/Chemical_equation.png" >
      </a>
    </td>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/Translate.png" target="_blank">
      <img src="img/Translate.png">
      </a>
      </td>
  </tr>
  <tr>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/depresse_lately_1.png" target="_blank">
      <img src="img/depresse_lately_1.png" >
      </a>
    </td>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/depresse_lately_2.png" target="_blank">
      <img src="img/depresse_lately_2.png">
      </a>
      </td>
  </tr>
  <tr>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/nuclear_boom_1.png" target="_blank">
      <img src="img/nuclear_boom_1.png" >
      </a>
    </td>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/nuclear_boom_2.png" target="_blank">
      <img src="img/nuclear_boom_2.png">
      </a>
      </td>
  </tr>

  <tr>
    <td>
      <a href="ARWKV-R1-1B5/blob/main/img/nuclear_power_plants.png" target="_blank">
      <img src="img/nuclear_power_plants.png" >
      </a>
    </td>
  </tr>
</table>