AAM Diffusion LLM v2.0 β Upgraded from Losion
Overview
AphantasicAbstractionModel (AAM) is a specialized sentence composer β NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.
AAM = 1 Mind + 1 Body
- Mind = RSVS Knowledge Graph (structural, relational memory)
- Body = This Diffusion LLM (generates natural language FROM the graph)
v2.0 Upgrade from Losion
This version incorporates 14 modules extracted from the Losion architecture:
Tier 1 β Critical Upgrades
| Module | Description | Impact |
|---|---|---|
| Anchored Diffusion Decoder | 2-3 step refinement instead of 50+ from noise | 10-20x speedup |
| Flow Matching Decoder | Velocity-based alternative to DDPM/DDIM | Faster + better inference |
| Evoformer Feedback | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap |
| Dual Memory System | Working memory + Long-term memory for coherent generation | Persistent context |
Tier 2 β Training & Reasoning
| Module | Description |
|---|---|
| MCTS Reasoning Engine | AlphaZero-style tree search for narrative arrangement |
| Thinking Toggle | Adaptive compute β simple=2 steps, complex=5+steps |
| Matryoshka Elastic | One training β multiple deployment sizes |
| GRPO Training | Group Relative Policy Optimization (no value function) |
| DAPO Training | Decoupled clip + dynamic sampling + token-level loss |
| Curriculum Learning | 4-phase: single-evidence β multi-evidence β reasoning β RL |
Tier 3 β Architecture Improvements
| Module | Description |
|---|---|
| SwiGLU FFN | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) |
| RoPE | Rotary Position Encoding for length generalization |
| Speculative Decoder | Draft model (graph encoder) + verify (diffusion model) |
| Quantization | BitNet 1-bit + FP8 weight-only quantization stubs |
Architecture
INPUT: Graph Conditioning (RSVS Knowledge Graph)
β
Graph Encoder (+ Dual Memory) β cross-attention keys/values
β
Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
ββ N Γ TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
ββ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
β
OUTPUT PIPELINE:
ββ Anchored Diffusion Decoder (2-3 steps, default)
ββ Flow Matching Decoder (2-3 steps, alternative)
ββ Legacy DDPM/DDIM (backward compatible)
β
INFERENCE CONTROLLER:
ββ Thinking Toggle (adaptive compute)
ββ MCTS Reasoning (complex queries)
ββ Matryoshka (select submodel size)
Training Pipeline
- Phase 1: Single-evidence simple narratives (25% budget)
- Phase 2: Multi-evidence narratives (30% budget)
- Phase 3: Complex reasoning + anomaly resolution (30% budget)
- Phase 4: GRPO/DAPO RL fine-tuning (15% budget)
Model Details
| Attribute | Value |
|---|---|
| Parameters | ~5.5M (demo) |
| d_model | 128 |
| n_layers | 4 |
| n_heads | 4 |
| Vocab size | 2000 |
| Diffusion steps | 200 (train) / 20 (inference) |
| Anchored refinement | 2-3 steps |
Usage
from diffusion_llm import AamDiffusionModel, AamDiffusionConfig
# Load config and model
config = AamDiffusionConfig.from_json("config.json")
model = AamDiffusionModel.load("pytorch_model.bin")
# Generate with anchored decoding (2-3 steps)
graph_cond = model.graph_encoder(
evidence_ids=evidence_ids,
evidence_confidence=confidence,
)
result = model.sample(graph_cond, method="anchored", n_steps=3)
tokens = model.embeddings_to_tokens(result)
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support