Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: openrail
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
tags:
|
| 7 |
+
- diffusion-llm
|
| 8 |
+
- parallel-generation
|
| 9 |
+
- custom-transformer
|
| 10 |
+
- cropmark
|
| 11 |
+
datasets:
|
| 12 |
+
- OpenAssistant/oasst1
|
| 13 |
+
metrics:
|
| 14 |
+
- cosine_similarity
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# 🪐 DiffReaper-5L
|
| 18 |
+
|
| 19 |
+
DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**. This model is under **active autonomous training** on an H100.
|
| 20 |
+
|
| 21 |
+
## 🔬 Model Details
|
| 22 |
+
|
| 23 |
+
- **Architecture:** 24-layer Custom Transformer with Time Embedding.
|
| 24 |
+
- **Task:** Conditioned Text Diffusion (Prompt-Response).
|
| 25 |
+
- **Training Objective:** Cosine Similarity Regression.
|
| 26 |
+
- **Sampling:** 10-step iterative parallel denoising.
|
| 27 |
+
|
| 28 |
+
## 🚀 Autonomous Training State
|
| 29 |
+
|
| 30 |
+
The model is training autonomously on an H100 with the following configuration:
|
| 31 |
+
- **Batch Size:** 16.
|
| 32 |
+
- **Learning Rate:** 1e-4.
|
| 33 |
+
- **Checkpointing:** Saves `diffreaper5l_{step}.pt` every 2,500 steps to [darwinkernelpanic/DiffReaper-5L](https://huggingface.co/darwinkernelpanic/DiffReaper-5L).
|
| 34 |
+
|
| 35 |
+
## 🛠️ Usage (Inference)
|
| 36 |
+
|
| 37 |
+
To run inference:
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
import torch
|
| 41 |
+
# Assuming DiffReaperModel is defined as in train_diffreaper_5l.py
|
| 42 |
+
|
| 43 |
+
model = DiffReaperModel(vocab_size=50257, n_embd=2048, n_head=32, n_layer=24).to("cuda")
|
| 44 |
+
model.load_state_dict(torch.load("diffreaper5l_latest.pt"))
|
| 45 |
+
model.eval()
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## 🎯 Fine-tuning
|
| 49 |
+
|
| 50 |
+
To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.
|
| 51 |
+
|
| 52 |
+
*Created by Darwin & Clawd.*
|