File size: 4,086 Bytes
46b1302
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
language:
- id
- en
license: mit
library_name: pytorch
tags:
- diffusion
- llm
- aam
- graph-conditioned
- sentence-arrangement
- evoformer
- anchored-decoding
- flow-matching
- dual-memory
- matryoshka
- swiglu
- rope
- mcts
- thinking-toggle
---

# AAM Diffusion LLM v2.0 β€” Upgraded from Losion

## Overview

AphantasicAbstractionModel (AAM) is a **specialized sentence composer** β€” NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.

**AAM = 1 Mind + 1 Body**
- **Mind** = RSVS Knowledge Graph (structural, relational memory)
- **Body** = This Diffusion LLM (generates natural language FROM the graph)

## v2.0 Upgrade from Losion

This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture:

### Tier 1 β€” Critical Upgrades
| Module | Description | Impact |
|--------|-------------|--------|
| **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup |
| **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference |
| **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap |
| **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context |

### Tier 2 β€” Training & Reasoning
| Module | Description |
|--------|-------------|
| **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement |
| **Thinking Toggle** | Adaptive compute β€” simple=2 steps, complex=5+steps |
| **Matryoshka Elastic** | One training β†’ multiple deployment sizes |
| **GRPO Training** | Group Relative Policy Optimization (no value function) |
| **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss |
| **Curriculum Learning** | 4-phase: single-evidence β†’ multi-evidence β†’ reasoning β†’ RL |

### Tier 3 β€” Architecture Improvements
| Module | Description |
|--------|-------------|
| **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) |
| **RoPE** | Rotary Position Encoding for length generalization |
| **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) |
| **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs |

## Architecture

```
INPUT: Graph Conditioning (RSVS Knowledge Graph)
       ↓
Graph Encoder (+ Dual Memory) β†’ cross-attention keys/values
       ↓
Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
  β”œβ”€ N Γ— TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
  └─ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
       ↓
OUTPUT PIPELINE:
  β”œβ”€ Anchored Diffusion Decoder (2-3 steps, default)
  β”œβ”€ Flow Matching Decoder (2-3 steps, alternative)
  └─ Legacy DDPM/DDIM (backward compatible)
       ↓
INFERENCE CONTROLLER:
  β”œβ”€ Thinking Toggle (adaptive compute)
  β”œβ”€ MCTS Reasoning (complex queries)
  └─ Matryoshka (select submodel size)
```

## Training Pipeline

1. **Phase 1**: Single-evidence simple narratives (25% budget)
2. **Phase 2**: Multi-evidence narratives (30% budget)
3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget)
4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget)

## Model Details

| Attribute | Value |
|-----------|-------|
| Parameters | ~5.5M (demo) |
| d_model | 128 |
| n_layers | 4 |
| n_heads | 4 |
| Vocab size | 2000 |
| Diffusion steps | 200 (train) / 20 (inference) |
| Anchored refinement | 2-3 steps |

## Usage

```python
from diffusion_llm import AamDiffusionModel, AamDiffusionConfig

# Load config and model
config = AamDiffusionConfig.from_json("config.json")
model = AamDiffusionModel.load("pytorch_model.bin")

# Generate with anchored decoding (2-3 steps)
graph_cond = model.graph_encoder(
    evidence_ids=evidence_ids,
    evidence_confidence=confidence,
)
result = model.sample(graph_cond, method="anchored", n_steps=3)
tokens = model.embeddings_to_tokens(result)
```