| --- |
| language: |
| - id |
| - en |
| license: mit |
| library_name: pytorch |
| tags: |
| - diffusion |
| - llm |
| - aam |
| - graph-conditioned |
| - sentence-arrangement |
| - evoformer |
| - anchored-decoding |
| - flow-matching |
| - dual-memory |
| - matryoshka |
| - swiglu |
| - rope |
| - mcts |
| - thinking-toggle |
| --- |
| |
| # AAM Diffusion LLM v2.0 β Upgraded from Losion |
|
|
| ## Overview |
|
|
| AphantasicAbstractionModel (AAM) is a **specialized sentence composer** β NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising. |
|
|
| **AAM = 1 Mind + 1 Body** |
| - **Mind** = RSVS Knowledge Graph (structural, relational memory) |
| - **Body** = This Diffusion LLM (generates natural language FROM the graph) |
|
|
| ## v2.0 Upgrade from Losion |
|
|
| This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture: |
|
|
| ### Tier 1 β Critical Upgrades |
| | Module | Description | Impact | |
| |--------|-------------|--------| |
| | **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup | |
| | **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference | |
| | **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap | |
| | **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context | |
|
|
| ### Tier 2 β Training & Reasoning |
| | Module | Description | |
| |--------|-------------| |
| | **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement | |
| | **Thinking Toggle** | Adaptive compute β simple=2 steps, complex=5+steps | |
| | **Matryoshka Elastic** | One training β multiple deployment sizes | |
| | **GRPO Training** | Group Relative Policy Optimization (no value function) | |
| | **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss | |
| | **Curriculum Learning** | 4-phase: single-evidence β multi-evidence β reasoning β RL | |
|
|
| ### Tier 3 β Architecture Improvements |
| | Module | Description | |
| |--------|-------------| |
| | **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) | |
| | **RoPE** | Rotary Position Encoding for length generalization | |
| | **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) | |
| | **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs | |
|
|
| ## Architecture |
|
|
| ``` |
| INPUT: Graph Conditioning (RSVS Knowledge Graph) |
| β |
| Graph Encoder (+ Dual Memory) β cross-attention keys/values |
| β |
| Diffusion Transformer (SwiGLU + RoPE + Matryoshka) |
| ββ N Γ TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN |
| ββ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling |
| β |
| OUTPUT PIPELINE: |
| ββ Anchored Diffusion Decoder (2-3 steps, default) |
| ββ Flow Matching Decoder (2-3 steps, alternative) |
| ββ Legacy DDPM/DDIM (backward compatible) |
| β |
| INFERENCE CONTROLLER: |
| ββ Thinking Toggle (adaptive compute) |
| ββ MCTS Reasoning (complex queries) |
| ββ Matryoshka (select submodel size) |
| ``` |
|
|
| ## Training Pipeline |
|
|
| 1. **Phase 1**: Single-evidence simple narratives (25% budget) |
| 2. **Phase 2**: Multi-evidence narratives (30% budget) |
| 3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget) |
| 4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget) |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Parameters | ~5.5M (demo) | |
| | d_model | 128 | |
| | n_layers | 4 | |
| | n_heads | 4 | |
| | Vocab size | 2000 | |
| | Diffusion steps | 200 (train) / 20 (inference) | |
| | Anchored refinement | 2-3 steps | |
| |
| ## Usage |
| |
| ```python |
| from diffusion_llm import AamDiffusionModel, AamDiffusionConfig |
|
|
| # Load config and model |
| config = AamDiffusionConfig.from_json("config.json") |
| model = AamDiffusionModel.load("pytorch_model.bin") |
|
|
| # Generate with anchored decoding (2-3 steps) |
| graph_cond = model.graph_encoder( |
| evidence_ids=evidence_ids, |
| evidence_confidence=confidence, |
| ) |
| result = model.sample(graph_cond, method="anchored", n_steps=3) |
| tokens = model.embeddings_to_tokens(result) |
| ``` |
| |