|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- codeparrot/github-code-clean |
|
|
- bigcode/starcoderdata |
|
|
- bigcode/the-stack-smol |
|
|
tags: |
|
|
- diffusion |
|
|
- llm |
|
|
- diffreaper |
|
|
- dllm |
|
|
- mercury |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
# DiffReaper 3 |
|
|
|
|
|
DiffReaper 3 is the third revision of DiffReaper; a experimental 1.5B parameter Discrete Diffusion Language Model (dLLM) designed for high-throughput parallel token prediction. |
|
|
Unlike traditional autoregressive models, DiffReaper is optimized for non-linear sequence refinement across mixed Python logic and natural language corpora. |
|
|
|
|
|
## Model Details |
|
|
- **Architecture:** 24-Layer Transformer Encoder |
|
|
- **Hidden Dimension:** 2048 |
|
|
- **Attention Heads:** 16 |
|
|
- **Objective:** Discrete Masked Diffusion (Mercury-style) |
|
|
- **Training Precision:** BF16 |
|
|
- **Context Window:** 1024 tokens |