Wolfvin commited on
Commit
46b1302
Β·
verified Β·
1 Parent(s): abae272

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ - en
5
+ license: mit
6
+ library_name: pytorch
7
+ tags:
8
+ - diffusion
9
+ - llm
10
+ - aam
11
+ - graph-conditioned
12
+ - sentence-arrangement
13
+ - evoformer
14
+ - anchored-decoding
15
+ - flow-matching
16
+ - dual-memory
17
+ - matryoshka
18
+ - swiglu
19
+ - rope
20
+ - mcts
21
+ - thinking-toggle
22
+ ---
23
+
24
+ # AAM Diffusion LLM v2.0 β€” Upgraded from Losion
25
+
26
+ ## Overview
27
+
28
+ AphantasicAbstractionModel (AAM) is a **specialized sentence composer** β€” NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.
29
+
30
+ **AAM = 1 Mind + 1 Body**
31
+ - **Mind** = RSVS Knowledge Graph (structural, relational memory)
32
+ - **Body** = This Diffusion LLM (generates natural language FROM the graph)
33
+
34
+ ## v2.0 Upgrade from Losion
35
+
36
+ This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture:
37
+
38
+ ### Tier 1 β€” Critical Upgrades
39
+ | Module | Description | Impact |
40
+ |--------|-------------|--------|
41
+ | **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup |
42
+ | **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference |
43
+ | **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap |
44
+ | **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context |
45
+
46
+ ### Tier 2 β€” Training & Reasoning
47
+ | Module | Description |
48
+ |--------|-------------|
49
+ | **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement |
50
+ | **Thinking Toggle** | Adaptive compute β€” simple=2 steps, complex=5+steps |
51
+ | **Matryoshka Elastic** | One training β†’ multiple deployment sizes |
52
+ | **GRPO Training** | Group Relative Policy Optimization (no value function) |
53
+ | **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss |
54
+ | **Curriculum Learning** | 4-phase: single-evidence β†’ multi-evidence β†’ reasoning β†’ RL |
55
+
56
+ ### Tier 3 β€” Architecture Improvements
57
+ | Module | Description |
58
+ |--------|-------------|
59
+ | **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) |
60
+ | **RoPE** | Rotary Position Encoding for length generalization |
61
+ | **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) |
62
+ | **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs |
63
+
64
+ ## Architecture
65
+
66
+ ```
67
+ INPUT: Graph Conditioning (RSVS Knowledge Graph)
68
+ ↓
69
+ Graph Encoder (+ Dual Memory) β†’ cross-attention keys/values
70
+ ↓
71
+ Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
72
+ β”œβ”€ N Γ— TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
73
+ └─ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
74
+ ↓
75
+ OUTPUT PIPELINE:
76
+ β”œβ”€ Anchored Diffusion Decoder (2-3 steps, default)
77
+ β”œβ”€ Flow Matching Decoder (2-3 steps, alternative)
78
+ └─ Legacy DDPM/DDIM (backward compatible)
79
+ ↓
80
+ INFERENCE CONTROLLER:
81
+ β”œβ”€ Thinking Toggle (adaptive compute)
82
+ β”œβ”€ MCTS Reasoning (complex queries)
83
+ └─ Matryoshka (select submodel size)
84
+ ```
85
+
86
+ ## Training Pipeline
87
+
88
+ 1. **Phase 1**: Single-evidence simple narratives (25% budget)
89
+ 2. **Phase 2**: Multi-evidence narratives (30% budget)
90
+ 3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget)
91
+ 4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget)
92
+
93
+ ## Model Details
94
+
95
+ | Attribute | Value |
96
+ |-----------|-------|
97
+ | Parameters | ~5.5M (demo) |
98
+ | d_model | 128 |
99
+ | n_layers | 4 |
100
+ | n_heads | 4 |
101
+ | Vocab size | 2000 |
102
+ | Diffusion steps | 200 (train) / 20 (inference) |
103
+ | Anchored refinement | 2-3 steps |
104
+
105
+ ## Usage
106
+
107
+ ```python
108
+ from diffusion_llm import AamDiffusionModel, AamDiffusionConfig
109
+
110
+ # Load config and model
111
+ config = AamDiffusionConfig.from_json("config.json")
112
+ model = AamDiffusionModel.load("pytorch_model.bin")
113
+
114
+ # Generate with anchored decoding (2-3 steps)
115
+ graph_cond = model.graph_encoder(
116
+ evidence_ids=evidence_ids,
117
+ evidence_confidence=confidence,
118
+ )
119
+ result = model.sample(graph_cond, method="anchored", n_steps=3)
120
+ tokens = model.embeddings_to_tokens(result)
121
+ ```