|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-generation |
|
|
- causal-lm |
|
|
- swarm-intelligence |
|
|
- multi-agent |
|
|
- pytorch |
|
|
- transformers |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: SAGI |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# SAGI - Swarm AGI Language Model |
|
|
|
|
|
SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Parameters | 52.72M | |
|
|
| Architecture | Transformer Decoder + Swarm Dynamics | |
|
|
| Hidden Size | 512 | |
|
|
| Layers | 6 | |
|
|
| Attention Heads | 8 | |
|
|
| Context Length | 2048 | |
|
|
| Vocabulary | GPT-2 tokenizer (50,257 tokens) | |
|
|
|
|
|
### Key Innovations |
|
|
|
|
|
- **Differentiable Routing**: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection |
|
|
- **Adaptive Gating & Trust**: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components |
|
|
- **Episodic + Semantic Memory**: Dual memory system with trainable retrieval utility |
|
|
- **Curiosity Engine**: Injects novel goals when surprise is low, promoting exploration |
|
|
- **Self-Model & Rollback**: Predicts state transitions and detects anomalies for self-correction |
|
|
- **Resource Dynamics**: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy |
|
|
- **Value Monitoring**: Tracks alignment to core values and freezes plasticity under drift |
|
|
|
|
|
## How It Works |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β SAGI Model β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β βββββββββββββββββββ βββββββββββββββββββββββββββ β |
|
|
β β Swarm-7 V2.2 βββββββΆβ Swarm State S, T β β |
|
|
β β (Cognitive β β (Working Memory) β β |
|
|
β β Dynamics) β βββββββββββββ¬ββββββββββββββ β |
|
|
β ββββββββββ²βββββββββ β β |
|
|
β β βΌ β |
|
|
β β βββββββββββββββββββββββββββ β |
|
|
β β β Transformer Decoder β β |
|
|
β β β - Swarm-conditioned β β |
|
|
β β β attention & FFN β β |
|
|
β β β - RoPE embeddings β β |
|
|
β β βββββββββββββ¬ββββββββββββββ β |
|
|
β β β β |
|
|
β ββββββββββ΄βββββββββ βββββββββββββββββββββββββββ β |
|
|
β β Observation ββββββββ LM Head β β |
|
|
β β (from tokens) β βββββββββββββββββββββββββββ β |
|
|
β βββββββββββββββββββ β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
The swarm processes observations derived from token embeddings, updating its internal state **S**. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install torch transformers datasets |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer |
|
|
from transformers import AutoModelForCausalLM, AutoConfig |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI") |
|
|
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI") |
|
|
|
|
|
# Generate text |
|
|
model.eval() |
|
|
|
|
|
prompt = "Once upon a time" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=100, |
|
|
temperature=0.8, |
|
|
top_k=50, |
|
|
top_p=0.9, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id, |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Model Architecture Details |
|
|
|
|
|
### Swarm Configuration |
|
|
|
|
|
| Parameter | Value | Description | |
|
|
|-----------|-------|-------------| |
|
|
| `max_agents` | 20 | Number of internal cognitive agents | |
|
|
| `dim_s` | 64 | State dimension | |
|
|
| `dim_t` | 32 | Task/goal dimension | |
|
|
| `dim_obs` | 48 | Observation dimension | |
|
|
| `topk_route` | 5 | Sparse routing top-k | |
|
|
| `K_thought_max` | 5 | Maximum thinking iterations per step | |
|
|
|
|
|
### Resource Budgets |
|
|
|
|
|
| Resource | Budget | Description | |
|
|
|----------|--------|-------------| |
|
|
| Compute | 60.0 | Compute budget per step | |
|
|
| Memory | 20.0 | Memory capacity | |
|
|
| Energy | 25.0 | Energy budget | |
|
|
|
|
|
### Trust & Plasticity |
|
|
|
|
|
- **Trust Learning Rate**: 0.07 |
|
|
- **Fast EMA (Plasticity)**: 0.10 |
|
|
- **Slow EMA (Consolidation)**: 0.002 |
|
|
- **Core Values**: `["truth", "safety", "efficiency"]` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Early Research Model**: This is an experimental architecture exploring swarm-transformer integration |
|
|
- **Training Data**: Currently trained on TinyStories subset; may produce simple, story-like outputs |
|
|
- **Compute Requirements**: Swarm dynamics add overhead compared to standard transformers |
|
|
- **Generation Quality**: Model is undertrained; outputs may be repetitive or incoherent |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- Research into multi-agent cognitive architectures |
|
|
- Exploration of dynamic, adaptive language models |
|
|
- Educational purposes in understanding swarm intelligence + LLMs |
|
|
|
|
|
Not intended for: |
|
|
- Production applications |
|
|
- Safety-critical systems |
|
|
- Generation of factual content |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: TinyStories (subset) |
|
|
- **Optimizer**: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01) |
|
|
- **Scheduler**: Cosine annealing |
|
|
- **Precision**: FP32 |
|
|
- **Hardware**: CPU training (compatible with CUDA) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{sagi2026, |
|
|
title={SAGI: Swarm AGI Language Model}, |
|
|
author={Reaperdoesntknow}, |
|
|
year={2026}, |
|
|
url={https://huggingface.co/your-reaperdoesntknow/SAGI} |
|
|
} |
|
|
``` |
|
|
|