--- language: - id - en license: mit library_name: pytorch tags: - diffusion - llm - aam - graph-conditioned - sentence-arrangement - evoformer - anchored-decoding - flow-matching - dual-memory - matryoshka - swiglu - rope - mcts - thinking-toggle --- # AAM Diffusion LLM v2.0 — Upgraded from Losion ## Overview AphantasicAbstractionModel (AAM) is a **specialized sentence composer** — NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising. **AAM = 1 Mind + 1 Body** - **Mind** = RSVS Knowledge Graph (structural, relational memory) - **Body** = This Diffusion LLM (generates natural language FROM the graph) ## v2.0 Upgrade from Losion This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture: ### Tier 1 — Critical Upgrades | Module | Description | Impact | |--------|-------------|--------| | **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup | | **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference | | **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap | | **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context | ### Tier 2 — Training & Reasoning | Module | Description | |--------|-------------| | **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement | | **Thinking Toggle** | Adaptive compute — simple=2 steps, complex=5+steps | | **Matryoshka Elastic** | One training → multiple deployment sizes | | **GRPO Training** | Group Relative Policy Optimization (no value function) | | **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss | | **Curriculum Learning** | 4-phase: single-evidence → multi-evidence → reasoning → RL | ### Tier 3 — Architecture Improvements | Module | Description | |--------|-------------| | **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) | | **RoPE** | Rotary Position Encoding for length generalization | | **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) | | **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs | ## Architecture ``` INPUT: Graph Conditioning (RSVS Knowledge Graph) ↓ Graph Encoder (+ Dual Memory) → cross-attention keys/values ↓ Diffusion Transformer (SwiGLU + RoPE + Matryoshka) ├─ N × TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN └─ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling ↓ OUTPUT PIPELINE: ├─ Anchored Diffusion Decoder (2-3 steps, default) ├─ Flow Matching Decoder (2-3 steps, alternative) └─ Legacy DDPM/DDIM (backward compatible) ↓ INFERENCE CONTROLLER: ├─ Thinking Toggle (adaptive compute) ├─ MCTS Reasoning (complex queries) └─ Matryoshka (select submodel size) ``` ## Training Pipeline 1. **Phase 1**: Single-evidence simple narratives (25% budget) 2. **Phase 2**: Multi-evidence narratives (30% budget) 3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget) 4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget) ## Model Details | Attribute | Value | |-----------|-------| | Parameters | ~5.5M (demo) | | d_model | 128 | | n_layers | 4 | | n_heads | 4 | | Vocab size | 2000 | | Diffusion steps | 200 (train) / 20 (inference) | | Anchored refinement | 2-3 steps | ## Usage ```python from diffusion_llm import AamDiffusionModel, AamDiffusionConfig # Load config and model config = AamDiffusionConfig.from_json("config.json") model = AamDiffusionModel.load("pytorch_model.bin") # Generate with anchored decoding (2-3 steps) graph_cond = model.graph_encoder( evidence_ids=evidence_ids, evidence_confidence=confidence, ) result = model.sample(graph_cond, method="anchored", n_steps=3) tokens = model.embeddings_to_tokens(result) ```