Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +121 -0

README.md ADDED Viewed

	@@ -0,0 +1,121 @@

+---
+language:
+- id
+- en
+license: mit
+library_name: pytorch
+tags:
+- diffusion
+- llm
+- aam
+- graph-conditioned
+- sentence-arrangement
+- evoformer
+- anchored-decoding
+- flow-matching
+- dual-memory
+- matryoshka
+- swiglu
+- rope
+- mcts
+- thinking-toggle
+---
+# AAM Diffusion LLM v2.0 — Upgraded from Losion
+## Overview
+AphantasicAbstractionModel (AAM) is a **specialized sentence composer** — NOT a general-purpose LLM. It takes structured graph data (evidence, anomalies, reasoning chains) as input and produces coherent, evidence-backed narrative output through iterative denoising.
+**AAM = 1 Mind + 1 Body**
+- **Mind** = RSVS Knowledge Graph (structural, relational memory)
+- **Body** = This Diffusion LLM (generates natural language FROM the graph)
+## v2.0 Upgrade from Losion
+This version incorporates 14 modules extracted from the [Losion](https://github.com/Wolfvin/Losion) architecture:
+### Tier 1 — Critical Upgrades
+| Module | Description | Impact |
+|--------|-------------|--------|
+| **Anchored Diffusion Decoder** | 2-3 step refinement instead of 50+ from noise | 10-20x speedup |
+| **Flow Matching Decoder** | Velocity-based alternative to DDPM/DDIM | Faster + better inference |
+| **Evoformer Feedback** | 4-level bidirectional feedback (layer/token/decoder/prediction) | Quality leap |
+| **Dual Memory System** | Working memory + Long-term memory for coherent generation | Persistent context |
+### Tier 2 — Training & Reasoning
+| Module | Description |
+|--------|-------------|
+| **MCTS Reasoning Engine** | AlphaZero-style tree search for narrative arrangement |
+| **Thinking Toggle** | Adaptive compute — simple=2 steps, complex=5+steps |
+| **Matryoshka Elastic** | One training → multiple deployment sizes |
+| **GRPO Training** | Group Relative Policy Optimization (no value function) |
+| **DAPO Training** | Decoupled clip + dynamic sampling + token-level loss |
+| **Curriculum Learning** | 4-phase: single-evidence → multi-evidence → reasoning → RL |
+### Tier 3 — Architecture Improvements
+| Module | Description |
+|--------|-------------|
+| **SwiGLU FFN** | Replaced GELU with SwiGLU (proven in LLaMA/Mistral) |
+| **RoPE** | Rotary Position Encoding for length generalization |
+| **Speculative Decoder** | Draft model (graph encoder) + verify (diffusion model) |
+| **Quantization** | BitNet 1-bit + FP8 weight-only quantization stubs |
+## Architecture
+```
+INPUT: Graph Conditioning (RSVS Knowledge Graph)
+       ↓
+Graph Encoder (+ Dual Memory) → cross-attention keys/values
+       ↓
+Diffusion Transformer (SwiGLU + RoPE + Matryoshka)
+  ├─ N × TransformerBlock: AdaLN + Self-Attn + Cross-Attn + SwiGLU FFN
+  └─ Evoformer Feedback: Layer + Token + Decoder + Prediction recycling
+       ↓
+OUTPUT PIPELINE:
+  ├─ Anchored Diffusion Decoder (2-3 steps, default)
+  ├─ Flow Matching Decoder (2-3 steps, alternative)
+  └─ Legacy DDPM/DDIM (backward compatible)
+       ↓
+INFERENCE CONTROLLER:
+  ├─ Thinking Toggle (adaptive compute)
+  ├─ MCTS Reasoning (complex queries)
+  └─ Matryoshka (select submodel size)
+```
+## Training Pipeline
+1. **Phase 1**: Single-evidence simple narratives (25% budget)
+2. **Phase 2**: Multi-evidence narratives (30% budget)
+3. **Phase 3**: Complex reasoning + anomaly resolution (30% budget)
+4. **Phase 4**: GRPO/DAPO RL fine-tuning (15% budget)
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| Parameters | ~5.5M (demo) |
+| d_model | 128 |
+| n_layers | 4 |
+| n_heads | 4 |
+| Vocab size | 2000 |
+| Diffusion steps | 200 (train) / 20 (inference) |
+| Anchored refinement | 2-3 steps |
+## Usage
+```python
+from diffusion_llm import AamDiffusionModel, AamDiffusionConfig
+# Load config and model
+config = AamDiffusionConfig.from_json("config.json")
+model = AamDiffusionModel.load("pytorch_model.bin")
+# Generate with anchored decoding (2-3 steps)
+graph_cond = model.graph_encoder(
+    evidence_ids=evidence_ids,
+    evidence_confidence=confidence,
+)
+result = model.sample(graph_cond, method="anchored", n_steps=3)
+tokens = model.embeddings_to_tokens(result)
+```