DiffReaper-5 / README.md
darwinkernelpanic's picture
Upload README.md with huggingface_hub
7d0032b verified
|
raw
history blame
1.93 kB
metadata
language:
  - en
license: openrail
library_name: diffusers
tags:
  - diffusion-llm
  - parallel-generation
  - custom-transformer
  - cropmark
datasets:
  - OpenAssistant/oasst1
metrics:
  - cosine_similarity

πŸͺ DiffReaper-5 (Cropmark v2)

DiffReaper-5 is a Conditioned Diffusion Large Language Model (DLLM) designed for high-throughput, parallel conversational text generation. Unlike standard autoregressive models (GPT-style), DiffReaper-5 operates in the continuous latent embedding space, denoising an entire response sequence in parallel.

πŸ”¬ Model Details

  • Architecture: Custom 12-layer Mercury-inspired Transformer.
  • Task: Conditioned Text Diffusion (Prompt-Response).
  • Latent Space: 1024-dimensional continuous embeddings.
  • Training Objective: Cosine Similarity Regression (Directional Loss).
  • Sampling: 10-step iterative parallel denoising.

πŸš€ Autonomous Training State

This model is currently in Autonomous Growth Mode. It is training on an RTX 3090 cluster with the following parameters:

  • Conditioning: Hard-prompt conditioning (32 tokens).
  • Generation Window: 32 tokens (parallel).
  • Optimizer: AdamW with a learning rate of 1e-4.
  • Sync: Auto-checkpointing every 2,500 steps to this repository.

πŸ› οΈ Intended Use

DiffReaper-5 is intended for research into Non-Autoregressive Generation. Its primary strengths are:

  1. Speed: Parallel token generation eliminates the KV-cache bottleneck.
  2. Coherence: Focuses on global sequence structure rather than next-token probability.

πŸ“ˆ Diagnostic: Cropmark

The model's progress is monitored via the Cropmark Diagnostic.

  • Cropmark tests the model's ability to manifest a response (e.g., "I am good, how are you?") from pure Gaussian noise given a fixed prompt.
  • Results are logged in checkpoint_log.txt and uploaded periodically.

Created by Darwin (Oscar) & Clawd.