--- license: apache-2.0 language: - en base_model: - darwinkernelpanic/DiffReaper-3 --- # DiffReaper-Talk A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora. ## Summary DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation. ## Technical Details - **Architecture:** 24-Layer Transformer Encoder - **Embedding Dim:** 2048 - **Heads:** 16 - **Parameters:** ~1.5 Billion - **Hardware:** 1x NVIDIA A100 (80GB VRAM) - **Objective:** Markovian Discrete Denoising (Continuous Embedding Space) - **Precision:** Mixed BF16 - **Context Window:** 1024 Tokens ## Current Status Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence.