Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: static | |
| pinned: false | |
| short_description: Reactive AI - Reactive Neural Networks and Event-Driven AI | |
| <img src="https://raw.githubusercontent.com/RxAI-dev/rxlm/refs/heads/main/assets/logo/logo_rxai_v2.png" width="350" /> | |
| # Reactive AI | |
| We are working on our own ideas of Reactive Neural Networks (RxNN) and Event-Driven AI, advancing from language models to AGI awareness models. | |
| ## Reactive Neural Networks and Event-Driven AI | |
| Reactive Neural Networks (RxNN) are memory-augmented neural networks with higher levels of recurrence (inter-sequence vs. intra-sequence in RNNs), | |
| focused on processing single interactions with access to previous interactions via memory layers. We call this _**event-driven real-time processing**_ | |
| to distinguish it from classical _data-driven processing_ of the full conversation history in each interaction. This difference is crucial in case | |
| of AGI and awareness - the key feature of humans awareness, is that we remember what we were doing 10 mins ago, without recalling the whole-day history - we | |
| are working in real-time - just like event-driven _Reactive Neural Networks_. | |
| In Event-Driven AI models are processing the data in reaction to environment or internal events, and are emitting other response events as a result. | |
| Processing of input and output events by the model is called the interaction. Event or an interaction could occur in any point in continous time. Models | |
| have to be stateful and remember the data between the interactions. | |
| _**Strong Reactive Neural Networks**_ like **Reactor** could emit and listen to its internal events, while the _**Weak Reactive Neural Networks**_ are | |
| working only on environment events. | |
| ## Reactor AGI | |
| <!-- <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_reactor.png" width="350" /> --> | |
| Our primary architecture - **Reactor** - is planned as the first _**awareness AGI model**_, that's modelling awareness as an _Infinite Chain-of-Thoughts_, | |
| connected to _Short-Term and Long-Term Memory_ (_Attention-based Memory System_) and _Receptors/Effectors_ systems for real-time reactive processing. | |
| It will be able to constantly and autonomously learn from interactions in _Continouos Live Learning_ process. | |
| > Reactor architecture details and mathematical model were analysed by 30 state-of-the-art LLM/Reasoning models that rated it's potential | |
| > to reach the AGI as ~4.35/5 | |
| ## Reactive Language Models (RxLM) | |
| While the **Reactor** is the main goal, it's extremely hard to achieve, as it's definitely the most advanced neural network ensemble ever. | |
| That's why we designed simplified architectures, for incremental transformation from language/reasoning models to awareness model: | |
| - **Reactive Transformer** is introducing _Attention-based Memory System_ and adding _Short-Term Memory_ to Transformer language models | |
| - **Preactor** is adding _Long-Term Memory_ and ability to learn from interactions | |
| ## RxLM vs LLM advantages | |
| Processing single interactions in real-time by **Reactive Language Models** leads to **revolutional** improvements in inference speed/cost: | |
| - LLM inference costs are increasing quadratically with conversation length (accumulated for each next message), because of full dialog history processing | |
| - RxLM inference costs are linear, depending only on single interaction tokens (not accumulated) - each next interaction is `number of steps` times cheaper than for LLM | |
| - same for inference speed - LLM has to process full history, while RxLM only single message (only first interaction could be slower because of encoder/memory attention overhead) | |
| > In example, for a dialog with **DeepSeek R1**, that have overally ~90k tokens, I paid for about 1.5M tokens. With **RxLM** it will cost only that ~90k tokens, so it | |
| > will be about **15x cheaper** | |
| ## RxNN Platform | |
| <img src="https://raw.githubusercontent.com/RxAI-dev/rxlm/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" /> | |
| ## Additional Research | |
| - **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences! | |
| - **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows | |
| - **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite | |
| - **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA |