File size: 931 Bytes
12d112c 4c33231 12d112c ad73c22 12d112c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: apache-2.0
language:
- en
base_model:
- darwinkernelpanic/DiffReaper-3
---
# DiffReaper-Talk
A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.
## Summary
DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.
## Technical Details
- **Architecture:** 24-Layer Transformer Encoder
- **Embedding Dim:** 2048
- **Heads:** 16
- **Parameters:** ~1.5 Billion
- **Hardware:** 1x NVIDIA A100 (80GB VRAM)
- **Objective:** Markovian Discrete Denoising (Continuous Embedding Space)
- **Precision:** Mixed BF16
- **Context Window:** 1024 Tokens
## Current Status
Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence. |