darwinkernelpanic commited on
Commit
0c3e462
·
verified ·
1 Parent(s): 94a96d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -16
README.md CHANGED
@@ -2,7 +2,6 @@
2
  language:
3
  - en
4
  license: openrail
5
- library_name: diffusers
6
  tags:
7
  - diffusion-llm
8
  - parallel-generation
@@ -12,27 +11,22 @@ datasets:
12
  - OpenAssistant/oasst1
13
  metrics:
14
  - cosine_similarity
 
 
15
  ---
16
 
17
- # 🪐 DiffReaper-5L
18
 
19
- DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**. This model is under **active autonomous training** on an H100.
20
 
21
- ## 🔬 Model Details
22
 
23
  - **Architecture:** 24-layer Custom Transformer with Time Embedding.
24
  - **Task:** Conditioned Text Diffusion (Prompt-Response).
25
  - **Training Objective:** Cosine Similarity Regression.
26
  - **Sampling:** 10-step iterative parallel denoising.
27
 
28
- ## 🚀 Autonomous Training State
29
-
30
- The model is training autonomously on an H100 with the following configuration:
31
- - **Batch Size:** 16.
32
- - **Learning Rate:** 1e-4.
33
- - **Checkpointing:** Saves `diffreaper5l_{step}.pt` every 2,500 steps to [darwinkernelpanic/DiffReaper-5L](https://huggingface.co/darwinkernelpanic/DiffReaper-5L).
34
-
35
- ## 🛠️ Usage (Inference)
36
 
37
  To run inference:
38
 
@@ -45,8 +39,6 @@ model.load_state_dict(torch.load("diffreaper5l_latest.pt"))
45
  model.eval()
46
  ```
47
 
48
- ## 🎯 Fine-tuning
49
-
50
- To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.
51
 
52
- *Created by Darwin & Clawd.*
 
2
  language:
3
  - en
4
  license: openrail
 
5
  tags:
6
  - diffusion-llm
7
  - parallel-generation
 
11
  - OpenAssistant/oasst1
12
  metrics:
13
  - cosine_similarity
14
+ base_model:
15
+ - darwinkernelpanic/DiffReaper-5
16
  ---
17
 
18
+ # DiffReaper-5L
19
 
20
+ DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**.
21
 
22
+ ## Model Details
23
 
24
  - **Architecture:** 24-layer Custom Transformer with Time Embedding.
25
  - **Task:** Conditioned Text Diffusion (Prompt-Response).
26
  - **Training Objective:** Cosine Similarity Regression.
27
  - **Sampling:** 10-step iterative parallel denoising.
28
 
29
+ ## Usage (Inference)
 
 
 
 
 
 
 
30
 
31
  To run inference:
32
 
 
39
  model.eval()
40
  ```
41
 
42
+ ## Fine-tuning
 
 
43
 
44
+ To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.