File size: 4,673 Bytes
9858bfd
 
c855565
11e816a
aafebfc
9858bfd
 
c855565
9858bfd
 
26c15a3
1417018
0a69cd2
c855565
0a69cd2
c855565
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1417018
d862f4e
1417018
0a69cd2
 
 
 
c96b48b
ac403ab
 
c855565
0a69cd2
 
 
 
 
 
87bc015
 
ac403ab
87bc015
 
 
 
 
 
361daa7
 
1417018
26c15a3
1417018
0a69cd2
 
ffbe869
a8eadf7
 
ffbe869
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: README
emoji: πŸ‘
colorFrom: blue
colorTo: red
sdk: static
pinned: false
short_description: Reactive AI - Reactive Neural Networks and Event-Driven AI
---

<img src="https://raw.githubusercontent.com/RxAI-dev/rxlm/refs/heads/main/assets/logo/logo_rxai_v2.png" width="350" />

# Reactive AI
We are working on our own ideas of Reactive Neural Networks (RxNN) and Event-Driven AI, advancing from language models to AGI awareness models.

## Reactive Neural Networks and Event-Driven AI
Reactive Neural Networks (RxNN) are memory-augmented neural networks with higher levels of recurrence (inter-sequence vs. intra-sequence in RNNs),
focused on processing single interactions with access to previous interactions via memory layers. We call this _**event-driven real-time processing**_
to distinguish it from classical _data-driven processing_ of the full conversation history in each interaction. This difference is crucial in case
of AGI and awareness - the key feature of humans awareness, is that we remember what we were doing 10 mins ago, without recalling the whole-day history - we
are working in real-time - just like event-driven _Reactive Neural Networks_.

In Event-Driven AI models are processing the data in reaction to environment or internal events, and are emitting other response events as a result.
Processing of input and output events by the model is called the interaction. Event or an interaction could occur in any point in continous time. Models
have to be stateful and remember the data between the interactions.

_**Strong Reactive Neural Networks**_ like **Reactor** could emit and listen to its internal events, while the _**Weak Reactive Neural Networks**_ are
working only on environment events.

## Reactor AGI

<!-- <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_reactor.png" width="350" /> -->

Our primary architecture - **Reactor** - is planned as the first _**awareness AGI model**_, that's modelling awareness as an _Infinite Chain-of-Thoughts_,
connected to _Short-Term and Long-Term Memory_ (_Attention-based Memory System_) and _Receptors/Effectors_ systems for real-time reactive processing.
It will be able to constantly and autonomously learn from interactions in _Continouos Live Learning_ process.

> Reactor architecture details and mathematical model were analysed by 30 state-of-the-art LLM/Reasoning models that rated it's potential
> to reach the AGI as ~4.35/5

## Reactive Language Models (RxLM)
While the **Reactor** is the main goal, it's extremely hard to achieve, as it's definitely the most advanced neural network ensemble ever.

That's why we designed simplified architectures, for incremental transformation from language/reasoning models to awareness model:
- **Reactive Transformer** is introducing _Attention-based Memory System_ and adding _Short-Term Memory_ to Transformer language models
- **Preactor** is adding _Long-Term Memory_ and ability to learn from interactions

## RxLM vs LLM advantages
Processing single interactions in real-time by **Reactive Language Models** leads to **revolutional** improvements in inference speed/cost:
- LLM inference costs are increasing quadratically with conversation length (accumulated for each next message), because of full dialog history processing
- RxLM inference costs are linear, depending only on single interaction tokens (not accumulated) - each next interaction is `number of steps` times cheaper than for LLM
- same for inference speed - LLM has to process full history, while RxLM only single message (only first interaction could be slower because of encoder/memory attention overhead)

> In example, for a dialog with **DeepSeek R1**, that have overally ~90k tokens, I paid for about 1.5M tokens. With **RxLM** it will cost only that ~90k tokens, so it
> will be about **15x cheaper**


## RxNN Platform

<img src="https://raw.githubusercontent.com/RxAI-dev/rxlm/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="350" />


## Additional Research
- **Sparse Query Attention (SQA)** - the most cost-effective GQA variant, even 2-3x faster for long sequences!
- **Flex-SQA** - combination of Flex Attention and (symmetric) Sparse Query Attention, enabling 4-8x longer sliding windows
- **Flex Memory Attention/Memory Cross-Attention** - connecting spatially sparse attention with memory layers to enable very long single interactions - smaller sliding window for input sequences attends to full memory, or the opposite
- **Mixture-of-Experts for Grouped Attention** - MoE Router dynamically selects GQA/SQA groups, instead of static selection. Abandoned, because results were worse than for GQA/SQA