---
license: apache-2.0
language:
- en
base_model:
- darwinkernelpanic/DiffReaper-3
---
# DiffReaper-Talk

A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.

## Summary
DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.

## Technical Details
- **Architecture:** 24-Layer Transformer Encoder
- **Embedding Dim:** 2048
- **Heads:** 16
- **Parameters:** ~1.5 Billion
- **Hardware:** 1x NVIDIA A100 (80GB VRAM)
- **Objective:** Markovian Discrete Denoising (Continuous Embedding Space)
- **Precision:** Mixed BF16
- **Context Window:** 1024 Tokens

## Current Status
Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence.