AdamF92 commited on
Commit
ffbe869
·
verified ·
1 Parent(s): 6cdb73d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -17
README.md CHANGED
@@ -55,30 +55,16 @@ Processing single interactions in real-time by **Reactive Language Models** lead
55
  > will be about **15x cheaper**
56
 
57
  > Reactive Transformer architecture was analysed by 10 state-of-the-art LLM/Reasoning models for its innovations and market disruption potential,
58
- > rated as ~4.36/5.0. Check - [Reactive Transformer AI Analysis](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/ai-analysis.md)
59
-
60
- ## Reactive Transformer - drafts
61
- - [Architecture introduction](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/reactive-transformer.md)
62
- - [Supervised Training stages](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/supervised-training.md)
63
- - [Reinforcement Learning stages](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/mrl.md)
64
-
65
- ### RxT-Alpha Open Research
66
- We are currently working on **Reactive Transformer Proof-of-Concept - RxT-Alpha**, especially on the new reinforcement learning stage - **Memory Reinforcement Learning**,
67
- that's required for our reactive models, between the _Supervised Memory System Training (SMST)_ and _Reinforcement Learning from Human Feedback for reactive models (RxRLHF)_.
68
- The research is open, we are publishing the results of all separate steps, just after finishing them.
69
-
70
- We are currently finishing **MRL** training for the world's first experimental (Proof-of-Concept) reactive model - [RxT-Alpha-Micro-Plus](https://huggingface.co/ReactiveAI/RxT-Alpha-Micro-Plus).
71
- That's only a micro-scale PoC (~27M params) trained on simple synthetic datasets to demonstrate memory system. Then, we will move to bigger scales and real-world datasets in RxT-Alpha-Mini and RxT-Alpha
72
 
73
 
74
  ## RxNN Platform
75
 
76
  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" />
77
 
78
- We are working on complete Reactive Neural Networks development framework - [RxNN github](https://github.com/RxAI-dev/RxNN)
79
 
80
  ## Additional Research
81
- - **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences! Research in progress - [draft](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/sparse_query_attention.md)
82
  - **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows
83
  - **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite
84
- - **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA - [more](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/moe_attention.md)
 
55
  > will be about **15x cheaper**
56
 
57
  > Reactive Transformer architecture was analysed by 10 state-of-the-art LLM/Reasoning models for its innovations and market disruption potential,
58
+ > rated as ~4.36/5.0.
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
 
61
  ## RxNN Platform
62
 
63
  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" />
64
 
 
65
 
66
  ## Additional Research
67
+ - **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences!
68
  - **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows
69
  - **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite
70
+ - **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA