DiffReaper-Talk / README.md

darwinkernelpanic

Update README.md

4c33231 verified 1 day ago

preview code

raw

history blame contribute delete

931 Bytes

metadata

license: apache-2.0
language:
  - en
base_model:
  - darwinkernelpanic/DiffReaper-3

DiffReaper-Talk

A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.

Summary

DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.

Technical Details

Architecture: 24-Layer Transformer Encoder
Embedding Dim: 2048
Heads: 16
Parameters: ~1.5 Billion
Hardware: 1x NVIDIA A100 (80GB VRAM)
Objective: Markovian Discrete Denoising (Continuous Embedding Space)
Precision: Mixed BF16
Context Window: 1024 Tokens

Current Status

Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence.