Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -55,30 +55,16 @@ Processing single interactions in real-time by **Reactive Language Models** lead
|
|
| 55 |
> will be about **15x cheaper**
|
| 56 |
|
| 57 |
> Reactive Transformer architecture was analysed by 10 state-of-the-art LLM/Reasoning models for its innovations and market disruption potential,
|
| 58 |
-
> rated as ~4.36/5.0.
|
| 59 |
-
|
| 60 |
-
## Reactive Transformer - drafts
|
| 61 |
-
- [Architecture introduction](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/reactive-transformer.md)
|
| 62 |
-
- [Supervised Training stages](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/supervised-training.md)
|
| 63 |
-
- [Reinforcement Learning stages](https://github.com/RxAI-dev/RxNN/blob/main/docs/research/ReactiveTransformer/mrl.md)
|
| 64 |
-
|
| 65 |
-
### RxT-Alpha Open Research
|
| 66 |
-
We are currently working on **Reactive Transformer Proof-of-Concept - RxT-Alpha**, especially on the new reinforcement learning stage - **Memory Reinforcement Learning**,
|
| 67 |
-
that's required for our reactive models, between the _Supervised Memory System Training (SMST)_ and _Reinforcement Learning from Human Feedback for reactive models (RxRLHF)_.
|
| 68 |
-
The research is open, we are publishing the results of all separate steps, just after finishing them.
|
| 69 |
-
|
| 70 |
-
We are currently finishing **MRL** training for the world's first experimental (Proof-of-Concept) reactive model - [RxT-Alpha-Micro-Plus](https://huggingface.co/ReactiveAI/RxT-Alpha-Micro-Plus).
|
| 71 |
-
That's only a micro-scale PoC (~27M params) trained on simple synthetic datasets to demonstrate memory system. Then, we will move to bigger scales and real-world datasets in RxT-Alpha-Mini and RxT-Alpha
|
| 72 |
|
| 73 |
|
| 74 |
## RxNN Platform
|
| 75 |
|
| 76 |
<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" />
|
| 77 |
|
| 78 |
-
We are working on complete Reactive Neural Networks development framework - [RxNN github](https://github.com/RxAI-dev/RxNN)
|
| 79 |
|
| 80 |
## Additional Research
|
| 81 |
-
- **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences!
|
| 82 |
- **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows
|
| 83 |
- **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite
|
| 84 |
-
- **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA
|
|
|
|
| 55 |
> will be about **15x cheaper**
|
| 56 |
|
| 57 |
> Reactive Transformer architecture was analysed by 10 state-of-the-art LLM/Reasoning models for its innovations and market disruption potential,
|
| 58 |
+
> rated as ~4.36/5.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
## RxNN Platform
|
| 62 |
|
| 63 |
<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" />
|
| 64 |
|
|
|
|
| 65 |
|
| 66 |
## Additional Research
|
| 67 |
+
- **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences!
|
| 68 |
- **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows
|
| 69 |
- **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite
|
| 70 |
+
- **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA
|