|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- darwinkernelpanic/DiffReaper-3 |
|
|
--- |
|
|
# DiffReaper-Talk |
|
|
|
|
|
A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora. |
|
|
|
|
|
## Summary |
|
|
DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation. |
|
|
|
|
|
## Technical Details |
|
|
- **Architecture:** 24-Layer Transformer Encoder |
|
|
- **Embedding Dim:** 2048 |
|
|
- **Heads:** 16 |
|
|
- **Parameters:** ~1.5 Billion |
|
|
- **Hardware:** 1x NVIDIA A100 (80GB VRAM) |
|
|
- **Objective:** Markovian Discrete Denoising (Continuous Embedding Space) |
|
|
- **Precision:** Mixed BF16 |
|
|
- **Context Window:** 1024 Tokens |
|
|
|
|
|
## Current Status |
|
|
Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence. |