--- license: apache-2.0 language: - en library_name: transformers tags: - text-generation - causal-lm - swarm-intelligence - multi-agent - pytorch - transformers pipeline_tag: text-generation model-index: - name: SAGI results: [] --- # SAGI - Swarm AGI Language Model SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory. ## Model Description | Property | Value | |----------|-------| | Parameters | 52.72M | | Architecture | Transformer Decoder + Swarm Dynamics | | Hidden Size | 512 | | Layers | 6 | | Attention Heads | 8 | | Context Length | 2048 | | Vocabulary | GPT-2 tokenizer (50,257 tokens) | ### Key Innovations - **Differentiable Routing**: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection - **Adaptive Gating & Trust**: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components - **Episodic + Semantic Memory**: Dual memory system with trainable retrieval utility - **Curiosity Engine**: Injects novel goals when surprise is low, promoting exploration - **Self-Model & Rollback**: Predicts state transitions and detects anomalies for self-correction - **Resource Dynamics**: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy - **Value Monitoring**: Tracks alignment to core values and freezes plasticity under drift ## How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ SAGI Model │ ├─────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────────────┐ │ │ │ Swarm-7 V2.2 │─────▶│ Swarm State S, T │ │ │ │ (Cognitive │ │ (Working Memory) │ │ │ │ Dynamics) │ └───────────┬─────────────┘ │ │ └────────▲────────┘ │ │ │ │ ▼ │ │ │ ┌─────────────────────────┐ │ │ │ │ Transformer Decoder │ │ │ │ │ - Swarm-conditioned │ │ │ │ │ attention & FFN │ │ │ │ │ - RoPE embeddings │ │ │ │ └───────────┬─────────────┘ │ │ │ │ │ │ ┌────────┴────────┐ ┌─────────────────────────┐ │ │ │ Observation │◀─────│ LM Head │ │ │ │ (from tokens) │ └─────────────────────────┘ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` The swarm processes observations derived from token embeddings, updating its internal state **S**. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing. ## Usage ### Installation ```bash pip install torch transformers datasets ``` ### Quick Start ```python from transformers import AutoTokenizer from transformers import AutoModelForCausalLM, AutoConfig # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI") tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI") # Generate text model.eval() prompt = "Once upon a time" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=100, temperature=0.8, top_k=50, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Model Architecture Details ### Swarm Configuration | Parameter | Value | Description | |-----------|-------|-------------| | `max_agents` | 20 | Number of internal cognitive agents | | `dim_s` | 64 | State dimension | | `dim_t` | 32 | Task/goal dimension | | `dim_obs` | 48 | Observation dimension | | `topk_route` | 5 | Sparse routing top-k | | `K_thought_max` | 5 | Maximum thinking iterations per step | ### Resource Budgets | Resource | Budget | Description | |----------|--------|-------------| | Compute | 60.0 | Compute budget per step | | Memory | 20.0 | Memory capacity | | Energy | 25.0 | Energy budget | ### Trust & Plasticity - **Trust Learning Rate**: 0.07 - **Fast EMA (Plasticity)**: 0.10 - **Slow EMA (Consolidation)**: 0.002 - **Core Values**: `["truth", "safety", "efficiency"]` ## Limitations - **Early Research Model**: This is an experimental architecture exploring swarm-transformer integration - **Training Data**: Currently trained on TinyStories subset; may produce simple, story-like outputs - **Compute Requirements**: Swarm dynamics add overhead compared to standard transformers - **Generation Quality**: Model is undertrained; outputs may be repetitive or incoherent ## Intended Use This model is intended for: - Research into multi-agent cognitive architectures - Exploration of dynamic, adaptive language models - Educational purposes in understanding swarm intelligence + LLMs Not intended for: - Production applications - Safety-critical systems - Generation of factual content ## Training Details - **Dataset**: TinyStories (subset) - **Optimizer**: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01) - **Scheduler**: Cosine annealing - **Precision**: FP32 - **Hardware**: CPU training (compatible with CUDA) ## Citation ```bibtex @software{sagi2026, title={SAGI: Swarm AGI Language Model}, author={Reaperdoesntknow}, year={2026}, url={https://huggingface.co/your-reaperdoesntknow/SAGI} } ```