SAGI

File size: 6,910 Bytes

---
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - text-generation
  - causal-lm
  - swarm-intelligence
  - multi-agent
  - pytorch
  - transformers
pipeline_tag: text-generation
model-index:
  - name: SAGI
    results: []
---

# SAGI - Swarm AGI Language Model

SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.

## Model Description

| Property | Value |
|----------|-------|
| Parameters | 52.72M |
| Architecture | Transformer Decoder + Swarm Dynamics |
| Hidden Size | 512 |
| Layers | 6 |
| Attention Heads | 8 |
| Context Length | 2048 |
| Vocabulary | GPT-2 tokenizer (50,257 tokens) |

### Key Innovations

- **Differentiable Routing**: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection
- **Adaptive Gating & Trust**: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components
- **Episodic + Semantic Memory**: Dual memory system with trainable retrieval utility
- **Curiosity Engine**: Injects novel goals when surprise is low, promoting exploration
- **Self-Model & Rollback**: Predicts state transitions and detects anomalies for self-correction
- **Resource Dynamics**: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
- **Value Monitoring**: Tracks alignment to core values and freezes plasticity under drift

## How It Works

```
┌─────────────────────────────────────────────────────────┐
│                       SAGI Model                         │
├─────────────────────────────────────────────────────────┤
│  ┌─────────────────┐      ┌─────────────────────────┐   │
│  │   Swarm-7 V2.2  │─────▶│  Swarm State S, T       │   │
│  │  (Cognitive     │      │  (Working Memory)       │   │
│  │   Dynamics)     │      └───────────┬─────────────┘   │
│  └────────▲────────┘                  │                 │
│           │                           ▼                 │
│           │              ┌─────────────────────────┐    │
│           │              │  Transformer Decoder    │    │
│           │              │  - Swarm-conditioned    │    │
│           │              │    attention & FFN      │    │
│           │              │  - RoPE embeddings      │    │
│           │              └───────────┬─────────────┘    │
│           │                          │                  │
│  ┌────────┴────────┐      ┌─────────────────────────┐   │
│  │   Observation   │◀─────│      LM Head            │   │
│  │   (from tokens) │      └─────────────────────────┘   │
│  └─────────────────┘                                    │
└─────────────────────────────────────────────────────────┘
```

The swarm processes observations derived from token embeddings, updating its internal state **S**. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.

## Usage

### Installation

```bash
pip install torch transformers datasets
```

### Quick Start

```python
from transformers import AutoTokenizer
from transformers import  AutoModelForCausalLM, AutoConfig

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")

# Generate text
model.eval()

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_k=50,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Model Architecture Details

### Swarm Configuration

| Parameter | Value | Description |
|-----------|-------|-------------|
| `max_agents` | 20 | Number of internal cognitive agents |
| `dim_s` | 64 | State dimension |
| `dim_t` | 32 | Task/goal dimension |
| `dim_obs` | 48 | Observation dimension |
| `topk_route` | 5 | Sparse routing top-k |
| `K_thought_max` | 5 | Maximum thinking iterations per step |

### Resource Budgets

| Resource | Budget | Description |
|----------|--------|-------------|
| Compute | 60.0 | Compute budget per step |
| Memory | 20.0 | Memory capacity |
| Energy | 25.0 | Energy budget |

### Trust & Plasticity

- **Trust Learning Rate**: 0.07
- **Fast EMA (Plasticity)**: 0.10
- **Slow EMA (Consolidation)**: 0.002
- **Core Values**: `["truth", "safety", "efficiency"]`

## Limitations

- **Early Research Model**: This is an experimental architecture exploring swarm-transformer integration
- **Training Data**: Currently trained on TinyStories subset; may produce simple, story-like outputs
- **Compute Requirements**: Swarm dynamics add overhead compared to standard transformers
- **Generation Quality**: Model is undertrained; outputs may be repetitive or incoherent

## Intended Use

This model is intended for:
- Research into multi-agent cognitive architectures
- Exploration of dynamic, adaptive language models
- Educational purposes in understanding swarm intelligence + LLMs

Not intended for:
- Production applications
- Safety-critical systems
- Generation of factual content

## Training Details

- **Dataset**: TinyStories (subset)
- **Optimizer**: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
- **Scheduler**: Cosine annealing
- **Precision**: FP32
- **Hardware**: CPU training (compatible with CUDA)

## Citation

```bibtex
@software{sagi2026,
  title={SAGI: Swarm AGI Language Model},
  author={Reaperdoesntknow},
  year={2026},
  url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}
```