AAM Diffusion LLM v2.0 β€” Upgraded from Losion

Overview

AphantasicAbstractionModel (AAM) is a specialized sentence composer β€” NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.

AAM = 1 Mind + 1 Body

  • Mind = RSVS Knowledge Graph (structural, relational memory)
  • Body = This Diffusion LLM (generates natural language FROM the graph)

v2.0 Upgrade from Losion

This version incorporates 14 modules extracted from the Losion architecture:

Tier 1 β€” Critical Upgrades

Module Description Impact
Anchored Diffusion Decoder 2-3 step refinement instead of 50+ from noise 10-20x speedup
Flow Matching Decoder Velocity-based alternative to DDPM/DDIM Faster + better inference
Evoformer Feedback 4-level bidirectional feedback (layer/token/decoder/prediction) Quality leap
Dual Memory System Working memory + Long-term memory for coherent generation Persistent context

Tier 2 β€” Training & Reasoning

Module Description
MCTS Reasoning Engine AlphaZero-style tree search for narrative arrangement
Thinking Toggle Adaptive compute β€” simple=2 steps, complex=5+steps
Matryoshka Elastic One training β†’ multiple deployment sizes
GRPO Training Group Relative Policy Optimization (no value function)
DAPO Training Decoupled clip + dynamic sampling + token-level loss
Curriculum Learning 4-phase: single-evidence β†’ multi-evidence β†’ reasoning β†’ RL

Tier 3 β€” Architecture Improvements

Module Description
SwiGLU FFN Replaced GELU with SwiGLU (proven in LLaMA/Mistral)
RoPE Rotary Position Encoding for length generalization
Speculative Decoder Draft model (graph encoder) + verify (diffusion model)
Quantization BitNet 1-bit + FP8 weight-only quantization stubs

Architecture

INPUT: Graph Conditioning (RSVS Knowledge Graph)
       ↓
Graph Encoder (+ Dual Memory) β†’ cross-attention keys/values
       ↓
Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
  β”œβ”€ N Γ— TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
  └─ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
       ↓
OUTPUT PIPELINE:
  β”œβ”€ Anchored Diffusion Decoder (2-3 steps, default)
  β”œβ”€ Flow Matching Decoder (2-3 steps, alternative)
  └─ Legacy DDPM/DDIM (backward compatible)
       ↓
INFERENCE CONTROLLER:
  β”œβ”€ Thinking Toggle (adaptive compute)
  β”œβ”€ MCTS Reasoning (complex queries)
  └─ Matryoshka (select submodel size)

Training Pipeline

  1. Phase 1: Single-evidence simple narratives (25% budget)
  2. Phase 2: Multi-evidence narratives (30% budget)
  3. Phase 3: Complex reasoning + anomaly resolution (30% budget)
  4. Phase 4: GRPO/DAPO RL fine-tuning (15% budget)

Model Details

Attribute Value
Parameters ~5.5M (demo)
d_model 128
n_layers 4
n_heads 4
Vocab size 2000
Diffusion steps 200 (train) / 20 (inference)
Anchored refinement 2-3 steps

Usage

from diffusion_llm import AamDiffusionModel, AamDiffusionConfig

# Load config and model
config = AamDiffusionConfig.from_json("config.json")
model = AamDiffusionModel.load("pytorch_model.bin")

# Generate with anchored decoding (2-3 steps)
graph_cond = model.graph_encoder(
    evidence_ids=evidence_ids,
    evidence_confidence=confidence,
)
result = model.sample(graph_cond, method="anchored", n_steps=3)
tokens = model.embeddings_to_tokens(result)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support