SAGI / README.md

Update README.md

d649a96 verified 12 days ago

6.91 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- text-generation
	- causal-lm
	- swarm-intelligence
	- multi-agent
	- pytorch
	- transformers
	pipeline_tag: text-generation
	model-index:
	- name: SAGI
	results: []
	---

	# SAGI - Swarm AGI Language Model

	SAGI is a novel causal language model that integrates swarm intelligence dynamics with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Parameters \| 52.72M \|
	\| Architecture \| Transformer Decoder + Swarm Dynamics \|
	\| Hidden Size \| 512 \|
	\| Layers \| 6 \|
	\| Attention Heads \| 8 \|
	\| Context Length \| 2048 \|
	\| Vocabulary \| GPT-2 tokenizer (50,257 tokens) \|

	### Key Innovations

	- Differentiable Routing: Continuous mixture-of-experts via attention (`DiffRouter`) instead of hard module selection
	- Adaptive Gating & Trust: `MetaController` activates capacity under resource constraints; trust dynamics bias reliable components
	- Episodic + Semantic Memory: Dual memory system with trainable retrieval utility
	- Curiosity Engine: Injects novel goals when surprise is low, promoting exploration
	- Self-Model & Rollback: Predicts state transitions and detects anomalies for self-correction
	- Resource Dynamics: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
	- Value Monitoring: Tracks alignment to core values and freezes plasticity under drift

	## How It Works

	```
	┌─────────────────────────────────────────────────────────┐
	│ SAGI Model │
	├─────────────────────────────────────────────────────────┤
	│ ┌─────────────────┐ ┌─────────────────────────┐ │
	│ │ Swarm-7 V2.2 │─────▶│ Swarm State S, T │ │
	│ │ (Cognitive │ │ (Working Memory) │ │
	│ │ Dynamics) │ └───────────┬─────────────┘ │
	│ └────────▲────────┘ │ │
	│ │ ▼ │
	│ │ ┌─────────────────────────┐ │
	│ │ │ Transformer Decoder │ │
	│ │ │ - Swarm-conditioned │ │
	│ │ │ attention & FFN │ │
	│ │ │ - RoPE embeddings │ │
	│ │ └───────────┬─────────────┘ │
	│ │ │ │
	│ ┌────────┴────────┐ ┌─────────────────────────┐ │
	│ │ Observation │◀─────│ LM Head │ │
	│ │ (from tokens) │ └─────────────────────────┘ │
	│ └─────────────────┘ │
	└─────────────────────────────────────────────────────────┘
	```

	The swarm processes observations derived from token embeddings, updating its internal state S. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.

	## Usage

	### Installation

	```bash
	pip install torch transformers datasets
	```

	### Quick Start

	```python
	from transformers import AutoTokenizer
	from transformers import AutoModelForCausalLM, AutoConfig

	# Load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
	tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")

	# Generate text
	model.eval()

	prompt = "Once upon a time"
	inputs = tokenizer(prompt, return_tensors="pt")

	outputs = model.generate(
	**inputs,
	max_new_tokens=100,
	temperature=0.8,
	top_k=50,
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Model Architecture Details

	### Swarm Configuration

	\| Parameter \| Value \| Description \|
	\|-----------\|-------\|-------------\|
	\| `max_agents` \| 20 \| Number of internal cognitive agents \|
	\| `dim_s` \| 64 \| State dimension \|
	\| `dim_t` \| 32 \| Task/goal dimension \|
	\| `dim_obs` \| 48 \| Observation dimension \|
	\| `topk_route` \| 5 \| Sparse routing top-k \|
	\| `K_thought_max` \| 5 \| Maximum thinking iterations per step \|

	### Resource Budgets

	\| Resource \| Budget \| Description \|
	\|----------\|--------\|-------------\|
	\| Compute \| 60.0 \| Compute budget per step \|
	\| Memory \| 20.0 \| Memory capacity \|
	\| Energy \| 25.0 \| Energy budget \|

	### Trust & Plasticity

	- Trust Learning Rate: 0.07
	- Fast EMA (Plasticity): 0.10
	- Slow EMA (Consolidation): 0.002
	- Core Values: `["truth", "safety", "efficiency"]`

	## Limitations

	- Early Research Model: This is an experimental architecture exploring swarm-transformer integration
	- Training Data: Currently trained on TinyStories subset; may produce simple, story-like outputs
	- Compute Requirements: Swarm dynamics add overhead compared to standard transformers
	- Generation Quality: Model is undertrained; outputs may be repetitive or incoherent

	## Intended Use

	This model is intended for:
	- Research into multi-agent cognitive architectures
	- Exploration of dynamic, adaptive language models
	- Educational purposes in understanding swarm intelligence + LLMs

	Not intended for:
	- Production applications
	- Safety-critical systems
	- Generation of factual content

	## Training Details

	- Dataset: TinyStories (subset)
	- Optimizer: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
	- Scheduler: Cosine annealing
	- Precision: FP32
	- Hardware: CPU training (compatible with CUDA)

	## Citation

	```bibtex
	@software{sagi2026,
	title={SAGI: Swarm AGI Language Model},
	author={Reaperdoesntknow},
	year={2026},
	url={https://huggingface.co/your-reaperdoesntknow/SAGI}
	}
	```