Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
license: openrail
|
| 5 |
-
library_name: diffusers
|
| 6 |
tags:
|
| 7 |
- diffusion-llm
|
| 8 |
- parallel-generation
|
|
@@ -12,27 +11,22 @@ datasets:
|
|
| 12 |
- OpenAssistant/oasst1
|
| 13 |
metrics:
|
| 14 |
- cosine_similarity
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
-
#
|
| 18 |
|
| 19 |
-
DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**.
|
| 20 |
|
| 21 |
-
##
|
| 22 |
|
| 23 |
- **Architecture:** 24-layer Custom Transformer with Time Embedding.
|
| 24 |
- **Task:** Conditioned Text Diffusion (Prompt-Response).
|
| 25 |
- **Training Objective:** Cosine Similarity Regression.
|
| 26 |
- **Sampling:** 10-step iterative parallel denoising.
|
| 27 |
|
| 28 |
-
##
|
| 29 |
-
|
| 30 |
-
The model is training autonomously on an H100 with the following configuration:
|
| 31 |
-
- **Batch Size:** 16.
|
| 32 |
-
- **Learning Rate:** 1e-4.
|
| 33 |
-
- **Checkpointing:** Saves `diffreaper5l_{step}.pt` every 2,500 steps to [darwinkernelpanic/DiffReaper-5L](https://huggingface.co/darwinkernelpanic/DiffReaper-5L).
|
| 34 |
-
|
| 35 |
-
## 🛠️ Usage (Inference)
|
| 36 |
|
| 37 |
To run inference:
|
| 38 |
|
|
@@ -45,8 +39,6 @@ model.load_state_dict(torch.load("diffreaper5l_latest.pt"))
|
|
| 45 |
model.eval()
|
| 46 |
```
|
| 47 |
|
| 48 |
-
##
|
| 49 |
-
|
| 50 |
-
To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.
|
| 51 |
|
| 52 |
-
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
license: openrail
|
|
|
|
| 5 |
tags:
|
| 6 |
- diffusion-llm
|
| 7 |
- parallel-generation
|
|
|
|
| 11 |
- OpenAssistant/oasst1
|
| 12 |
metrics:
|
| 13 |
- cosine_similarity
|
| 14 |
+
base_model:
|
| 15 |
+
- darwinkernelpanic/DiffReaper-5
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# DiffReaper-5L
|
| 19 |
|
| 20 |
+
DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**.
|
| 21 |
|
| 22 |
+
## Model Details
|
| 23 |
|
| 24 |
- **Architecture:** 24-layer Custom Transformer with Time Embedding.
|
| 25 |
- **Task:** Conditioned Text Diffusion (Prompt-Response).
|
| 26 |
- **Training Objective:** Cosine Similarity Regression.
|
| 27 |
- **Sampling:** 10-step iterative parallel denoising.
|
| 28 |
|
| 29 |
+
## Usage (Inference)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
To run inference:
|
| 32 |
|
|
|
|
| 39 |
model.eval()
|
| 40 |
```
|
| 41 |
|
| 42 |
+
## Fine-tuning
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.
|