Update README.md
Browse files
README.md
CHANGED
|
@@ -14,11 +14,11 @@ metrics:
|
|
| 14 |
- cosine_similarity
|
| 15 |
---
|
| 16 |
|
| 17 |
-
#
|
| 18 |
|
| 19 |
DiffReaper-5 is a **Conditioned Diffusion Large Language Model (DLLM)** designed for high-throughput, parallel conversational text generation. Unlike standard autoregressive models (GPT-style), DiffReaper-5 operates in the continuous latent embedding space, denoising an entire response sequence in parallel.
|
| 20 |
|
| 21 |
-
##
|
| 22 |
|
| 23 |
- **Architecture:** Custom 12-layer Mercury-inspired Transformer.
|
| 24 |
- **Task:** Conditioned Text Diffusion (Prompt-Response).
|
|
@@ -26,15 +26,7 @@ DiffReaper-5 is a **Conditioned Diffusion Large Language Model (DLLM)** designed
|
|
| 26 |
- **Training Objective:** Cosine Similarity Regression (Directional Loss).
|
| 27 |
- **Sampling:** 10-step iterative parallel denoising.
|
| 28 |
|
| 29 |
-
##
|
| 30 |
-
|
| 31 |
-
This model is currently in **Autonomous Growth Mode**. It is training on an RTX 3090 cluster with the following parameters:
|
| 32 |
-
- **Conditioning:** Hard-prompt conditioning (32 tokens).
|
| 33 |
-
- **Generation Window:** 32 tokens (parallel).
|
| 34 |
-
- **Optimizer:** AdamW with a learning rate of 1e-4.
|
| 35 |
-
- **Sync:** Auto-checkpointing every 2,500 steps to this repository.
|
| 36 |
-
|
| 37 |
-
## 🛠️ Usage (Inference)
|
| 38 |
|
| 39 |
Unlike autoregressive models, DiffReaper-5 generates the entire response in parallel through iterative denoising. Use the following logic to run inference:
|
| 40 |
|
|
@@ -69,7 +61,7 @@ def generate(model, tokenizer, prompt, steps=10):
|
|
| 69 |
# model.load_state_dict(torch.load("cropmark_latest.pt"))
|
| 70 |
```
|
| 71 |
|
| 72 |
-
##
|
| 73 |
|
| 74 |
To fine-tune DiffReaper-5 on a custom dataset:
|
| 75 |
1. **Objective:** Use `1 - F.cosine_similarity` between predicted and target embeddings.
|
|
@@ -81,6 +73,3 @@ To fine-tune DiffReaper-5 on a custom dataset:
|
|
| 81 |
The model's progress is monitored via the **Cropmark Diagnostic**.
|
| 82 |
- **Cropmark** tests the model's ability to manifest a response (e.g., "I am good, how are you?") from pure Gaussian noise given a fixed prompt.
|
| 83 |
- Results are logged in `checkpoint_log.txt` and uploaded periodically.
|
| 84 |
-
|
| 85 |
-
---
|
| 86 |
-
*Created by Darwin & Clawd.*
|
|
|
|
| 14 |
- cosine_similarity
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# DiffReaper-5
|
| 18 |
|
| 19 |
DiffReaper-5 is a **Conditioned Diffusion Large Language Model (DLLM)** designed for high-throughput, parallel conversational text generation. Unlike standard autoregressive models (GPT-style), DiffReaper-5 operates in the continuous latent embedding space, denoising an entire response sequence in parallel.
|
| 20 |
|
| 21 |
+
## Model Details
|
| 22 |
|
| 23 |
- **Architecture:** Custom 12-layer Mercury-inspired Transformer.
|
| 24 |
- **Task:** Conditioned Text Diffusion (Prompt-Response).
|
|
|
|
| 26 |
- **Training Objective:** Cosine Similarity Regression (Directional Loss).
|
| 27 |
- **Sampling:** 10-step iterative parallel denoising.
|
| 28 |
|
| 29 |
+
## Usage (Inference)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
Unlike autoregressive models, DiffReaper-5 generates the entire response in parallel through iterative denoising. Use the following logic to run inference:
|
| 32 |
|
|
|
|
| 61 |
# model.load_state_dict(torch.load("cropmark_latest.pt"))
|
| 62 |
```
|
| 63 |
|
| 64 |
+
## Fine-tuning
|
| 65 |
|
| 66 |
To fine-tune DiffReaper-5 on a custom dataset:
|
| 67 |
1. **Objective:** Use `1 - F.cosine_similarity` between predicted and target embeddings.
|
|
|
|
| 73 |
The model's progress is monitored via the **Cropmark Diagnostic**.
|
| 74 |
- **Cropmark** tests the model's ability to manifest a response (e.g., "I am good, how are you?") from pure Gaussian noise given a fixed prompt.
|
| 75 |
- Results are logged in `checkpoint_log.txt` and uploaded periodically.
|
|
|
|
|
|
|
|
|