SAGI / README.md
reaperdoesntknow's picture
Update README.md
d649a96 verified
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- text-generation
- causal-lm
- swarm-intelligence
- multi-agent
- pytorch
- transformers
pipeline_tag: text-generation
model-index:
- name: SAGI
results: []
---
# SAGI - Swarm AGI Language Model
SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.
## Model Description
| Property | Value |
|----------|-------|
| Parameters | 52.72M |
| Architecture | Transformer Decoder + Swarm Dynamics |
| Hidden Size | 512 |
| Layers | 6 |
| Attention Heads | 8 |
| Context Length | 2048 |
| Vocabulary | GPT-2 tokenizer (50,257 tokens) |
### Key Innovations
- **Differentiable Routing**: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection
- **Adaptive Gating & Trust**: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components
- **Episodic + Semantic Memory**: Dual memory system with trainable retrieval utility
- **Curiosity Engine**: Injects novel goals when surprise is low, promoting exploration
- **Self-Model & Rollback**: Predicts state transitions and detects anomalies for self-correction
- **Resource Dynamics**: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
- **Value Monitoring**: Tracks alignment to core values and freezes plasticity under drift
## How It Works
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SAGI Model β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Swarm-7 V2.2 │─────▢│ Swarm State S, T β”‚ β”‚
β”‚ β”‚ (Cognitive β”‚ β”‚ (Working Memory) β”‚ β”‚
β”‚ β”‚ Dynamics) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β–Ό β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ β”‚ Transformer Decoder β”‚ β”‚
β”‚ β”‚ β”‚ - Swarm-conditioned β”‚ β”‚
β”‚ β”‚ β”‚ attention & FFN β”‚ β”‚
β”‚ β”‚ β”‚ - RoPE embeddings β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Observation │◀─────│ LM Head β”‚ β”‚
β”‚ β”‚ (from tokens) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
The swarm processes observations derived from token embeddings, updating its internal state **S**. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.
## Usage
### Installation
```bash
pip install torch transformers datasets
```
### Quick Start
```python
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM, AutoConfig
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")
# Generate text
model.eval()
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
top_k=50,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Model Architecture Details
### Swarm Configuration
| Parameter | Value | Description |
|-----------|-------|-------------|
| `max_agents` | 20 | Number of internal cognitive agents |
| `dim_s` | 64 | State dimension |
| `dim_t` | 32 | Task/goal dimension |
| `dim_obs` | 48 | Observation dimension |
| `topk_route` | 5 | Sparse routing top-k |
| `K_thought_max` | 5 | Maximum thinking iterations per step |
### Resource Budgets
| Resource | Budget | Description |
|----------|--------|-------------|
| Compute | 60.0 | Compute budget per step |
| Memory | 20.0 | Memory capacity |
| Energy | 25.0 | Energy budget |
### Trust & Plasticity
- **Trust Learning Rate**: 0.07
- **Fast EMA (Plasticity)**: 0.10
- **Slow EMA (Consolidation)**: 0.002
- **Core Values**: `["truth", "safety", "efficiency"]`
## Limitations
- **Early Research Model**: This is an experimental architecture exploring swarm-transformer integration
- **Training Data**: Currently trained on TinyStories subset; may produce simple, story-like outputs
- **Compute Requirements**: Swarm dynamics add overhead compared to standard transformers
- **Generation Quality**: Model is undertrained; outputs may be repetitive or incoherent
## Intended Use
This model is intended for:
- Research into multi-agent cognitive architectures
- Exploration of dynamic, adaptive language models
- Educational purposes in understanding swarm intelligence + LLMs
Not intended for:
- Production applications
- Safety-critical systems
- Generation of factual content
## Training Details
- **Dataset**: TinyStories (subset)
- **Optimizer**: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
- **Scheduler**: Cosine annealing
- **Precision**: FP32
- **Hardware**: CPU training (compatible with CUDA)
## Citation
```bibtex
@software{sagi2026,
title={SAGI: Swarm AGI Language Model},
author={Reaperdoesntknow},
year={2026},
url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}
```