Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- id
|
| 4 |
+
- en
|
| 5 |
+
license: mit
|
| 6 |
+
library_name: pytorch
|
| 7 |
+
tags:
|
| 8 |
+
- diffusion
|
| 9 |
+
- llm
|
| 10 |
+
- aam
|
| 11 |
+
- graph-conditioned
|
| 12 |
+
- sentence-arrangement
|
| 13 |
+
- evoformer
|
| 14 |
+
- anchored-decoding
|
| 15 |
+
- flow-matching
|
| 16 |
+
- dual-memory
|
| 17 |
+
- matryoshka
|
| 18 |
+
- swiglu
|
| 19 |
+
- rope
|
| 20 |
+
- mcts
|
| 21 |
+
- thinking-toggle
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
# AAM Diffusion LLM v2.0 β Upgraded from Losion
|
| 25 |
+
|
| 26 |
+
## Overview
|
| 27 |
+
|
| 28 |
+
AphantasicAbstractionModel (AAM) is a **specialized sentence composer** β NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.
|
| 29 |
+
|
| 30 |
+
**AAM = 1 Mind + 1 Body**
|
| 31 |
+
- **Mind** = RSVS Knowledge Graph (structural, relational memory)
|
| 32 |
+
- **Body** = This Diffusion LLM (generates natural language FROM the graph)
|
| 33 |
+
|
| 34 |
+
## v2.0 Upgrade from Losion
|
| 35 |
+
|
| 36 |
+
This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture:
|
| 37 |
+
|
| 38 |
+
### Tier 1 β Critical Upgrades
|
| 39 |
+
| Module | Description | Impact |
|
| 40 |
+
|--------|-------------|--------|
|
| 41 |
+
| **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup |
|
| 42 |
+
| **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference |
|
| 43 |
+
| **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap |
|
| 44 |
+
| **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context |
|
| 45 |
+
|
| 46 |
+
### Tier 2 β Training & Reasoning
|
| 47 |
+
| Module | Description |
|
| 48 |
+
|--------|-------------|
|
| 49 |
+
| **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement |
|
| 50 |
+
| **Thinking Toggle** | Adaptive compute β simple=2 steps, complex=5+steps |
|
| 51 |
+
| **Matryoshka Elastic** | One training β multiple deployment sizes |
|
| 52 |
+
| **GRPO Training** | Group Relative Policy Optimization (no value function) |
|
| 53 |
+
| **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss |
|
| 54 |
+
| **Curriculum Learning** | 4-phase: single-evidence β multi-evidence β reasoning β RL |
|
| 55 |
+
|
| 56 |
+
### Tier 3 β Architecture Improvements
|
| 57 |
+
| Module | Description |
|
| 58 |
+
|--------|-------------|
|
| 59 |
+
| **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) |
|
| 60 |
+
| **RoPE** | Rotary Position Encoding for length generalization |
|
| 61 |
+
| **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) |
|
| 62 |
+
| **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs |
|
| 63 |
+
|
| 64 |
+
## Architecture
|
| 65 |
+
|
| 66 |
+
```
|
| 67 |
+
INPUT: Graph Conditioning (RSVS Knowledge Graph)
|
| 68 |
+
β
|
| 69 |
+
Graph Encoder (+ Dual Memory) β cross-attention keys/values
|
| 70 |
+
β
|
| 71 |
+
Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
|
| 72 |
+
ββ N Γ TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
|
| 73 |
+
ββ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
|
| 74 |
+
β
|
| 75 |
+
OUTPUT PIPELINE:
|
| 76 |
+
ββ Anchored Diffusion Decoder (2-3 steps, default)
|
| 77 |
+
ββ Flow Matching Decoder (2-3 steps, alternative)
|
| 78 |
+
ββ Legacy DDPM/DDIM (backward compatible)
|
| 79 |
+
β
|
| 80 |
+
INFERENCE CONTROLLER:
|
| 81 |
+
ββ Thinking Toggle (adaptive compute)
|
| 82 |
+
ββ MCTS Reasoning (complex queries)
|
| 83 |
+
ββ Matryoshka (select submodel size)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## Training Pipeline
|
| 87 |
+
|
| 88 |
+
1. **Phase 1**: Single-evidence simple narratives (25% budget)
|
| 89 |
+
2. **Phase 2**: Multi-evidence narratives (30% budget)
|
| 90 |
+
3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget)
|
| 91 |
+
4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget)
|
| 92 |
+
|
| 93 |
+
## Model Details
|
| 94 |
+
|
| 95 |
+
| Attribute | Value |
|
| 96 |
+
|-----------|-------|
|
| 97 |
+
| Parameters | ~5.5M (demo) |
|
| 98 |
+
| d_model | 128 |
|
| 99 |
+
| n_layers | 4 |
|
| 100 |
+
| n_heads | 4 |
|
| 101 |
+
| Vocab size | 2000 |
|
| 102 |
+
| Diffusion steps | 200 (train) / 20 (inference) |
|
| 103 |
+
| Anchored refinement | 2-3 steps |
|
| 104 |
+
|
| 105 |
+
## Usage
|
| 106 |
+
|
| 107 |
+
```python
|
| 108 |
+
from diffusion_llm import AamDiffusionModel, AamDiffusionConfig
|
| 109 |
+
|
| 110 |
+
# Load config and model
|
| 111 |
+
config = AamDiffusionConfig.from_json("config.json")
|
| 112 |
+
model = AamDiffusionModel.load("pytorch_model.bin")
|
| 113 |
+
|
| 114 |
+
# Generate with anchored decoding (2-3 steps)
|
| 115 |
+
graph_cond = model.graph_encoder(
|
| 116 |
+
evidence_ids=evidence_ids,
|
| 117 |
+
evidence_confidence=confidence,
|
| 118 |
+
)
|
| 119 |
+
result = model.sample(graph_cond, method="anchored", n_steps=3)
|
| 120 |
+
tokens = model.embeddings_to_tokens(result)
|
| 121 |
+
```
|