File size: 6,910 Bytes
65414e8
b1c6219
 
 
65414e8
b1c6219
 
 
 
 
 
 
 
 
 
 
65414e8
 
b1c6219
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d649a96
b1c6219
 
d649a96
 
b1c6219
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65414e8
b1c6219
65414e8
b1c6219
 
 
 
 
65414e8
b1c6219
65414e8
b1c6219
 
 
 
 
 
65414e8
b1c6219
 
 
 
65414e8
b1c6219
65414e8
b1c6219
 
 
 
65414e8
b1c6219
 
 
 
65414e8
 
 
b1c6219
 
 
 
 
65414e8
b1c6219
65414e8
b1c6219
d649a96
b1c6219
d649a96
 
 
b1c6219
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - text-generation
  - causal-lm
  - swarm-intelligence
  - multi-agent
  - pytorch
  - transformers
pipeline_tag: text-generation
model-index:
  - name: SAGI
    results: []
---

# SAGI - Swarm AGI Language Model

SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.

## Model Description

| Property | Value |
|----------|-------|
| Parameters | 52.72M |
| Architecture | Transformer Decoder + Swarm Dynamics |
| Hidden Size | 512 |
| Layers | 6 |
| Attention Heads | 8 |
| Context Length | 2048 |
| Vocabulary | GPT-2 tokenizer (50,257 tokens) |

### Key Innovations

- **Differentiable Routing**: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection
- **Adaptive Gating & Trust**: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components
- **Episodic + Semantic Memory**: Dual memory system with trainable retrieval utility
- **Curiosity Engine**: Injects novel goals when surprise is low, promoting exploration
- **Self-Model & Rollback**: Predicts state transitions and detects anomalies for self-correction
- **Resource Dynamics**: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
- **Value Monitoring**: Tracks alignment to core values and freezes plasticity under drift

## How It Works

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       SAGI Model                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   Swarm-7 V2.2  │─────▢│  Swarm State S, T       β”‚   β”‚
β”‚  β”‚  (Cognitive     β”‚      β”‚  (Working Memory)       β”‚   β”‚
β”‚  β”‚   Dynamics)     β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚                 β”‚
β”‚           β”‚                           β–Ό                 β”‚
β”‚           β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚           β”‚              β”‚  Transformer Decoder    β”‚    β”‚
β”‚           β”‚              β”‚  - Swarm-conditioned    β”‚    β”‚
β”‚           β”‚              β”‚    attention & FFN      β”‚    β”‚
β”‚           β”‚              β”‚  - RoPE embeddings      β”‚    β”‚
β”‚           β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                          β”‚                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   Observation   │◀─────│      LM Head            β”‚   β”‚
β”‚  β”‚   (from tokens) β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

The swarm processes observations derived from token embeddings, updating its internal state **S**. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.

## Usage

### Installation

```bash
pip install torch transformers datasets
```

### Quick Start

```python
from transformers import AutoTokenizer
from transformers import  AutoModelForCausalLM, AutoConfig

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")

# Generate text
model.eval()

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_k=50,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Model Architecture Details

### Swarm Configuration

| Parameter | Value | Description |
|-----------|-------|-------------|
| `max_agents` | 20 | Number of internal cognitive agents |
| `dim_s` | 64 | State dimension |
| `dim_t` | 32 | Task/goal dimension |
| `dim_obs` | 48 | Observation dimension |
| `topk_route` | 5 | Sparse routing top-k |
| `K_thought_max` | 5 | Maximum thinking iterations per step |

### Resource Budgets

| Resource | Budget | Description |
|----------|--------|-------------|
| Compute | 60.0 | Compute budget per step |
| Memory | 20.0 | Memory capacity |
| Energy | 25.0 | Energy budget |

### Trust & Plasticity

- **Trust Learning Rate**: 0.07
- **Fast EMA (Plasticity)**: 0.10
- **Slow EMA (Consolidation)**: 0.002
- **Core Values**: `["truth", "safety", "efficiency"]`

## Limitations

- **Early Research Model**: This is an experimental architecture exploring swarm-transformer integration
- **Training Data**: Currently trained on TinyStories subset; may produce simple, story-like outputs
- **Compute Requirements**: Swarm dynamics add overhead compared to standard transformers
- **Generation Quality**: Model is undertrained; outputs may be repetitive or incoherent

## Intended Use

This model is intended for:
- Research into multi-agent cognitive architectures
- Exploration of dynamic, adaptive language models
- Educational purposes in understanding swarm intelligence + LLMs

Not intended for:
- Production applications
- Safety-critical systems
- Generation of factual content

## Training Details

- **Dataset**: TinyStories (subset)
- **Optimizer**: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
- **Scheduler**: Cosine annealing
- **Precision**: FP32
- **Hardware**: CPU training (compatible with CUDA)

## Citation

```bibtex
@software{sagi2026,
  title={SAGI: Swarm AGI Language Model},
  author={Reaperdoesntknow},
  year={2026},
  url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}
```