darwinkernelpanic commited on
Commit
ad73c22
·
verified ·
1 Parent(s): 351338d

Step 5000: Initial Foundational Pre-training Weight Drop

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DiffReaper-Talk
2
+
3
+ A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.
4
+
5
+ ## Summary
6
+ DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.
7
+
8
+ ## Technical Details
9
+ - **Architecture:** 24-Layer Transformer Encoder
10
+ - **Embedding Dim:** 2048
11
+ - **Heads:** 16
12
+ - **Parameters:** ~1.5 Billion
13
+ - **Hardware:** 1x NVIDIA A100 (80GB VRAM)
14
+ - **Objective:** Markovian Discrete Denoising (Continuous Embedding Space)
15
+ - **Precision:** Mixed BF16
16
+ - **Context Window:** 1024 Tokens
17
+
18
+ ## Current Status
19
+ Foundational pre-training active. Logic and domain-specific training (Code) to be applied post-convergence.