darwinkernelpanic commited on
Commit
3551500
·
verified ·
1 Parent(s): 0e9072b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: openrail
5
+ library_name: diffusers
6
+ tags:
7
+ - diffusion-llm
8
+ - parallel-generation
9
+ - custom-transformer
10
+ - cropmark
11
+ datasets:
12
+ - OpenAssistant/oasst1
13
+ metrics:
14
+ - cosine_similarity
15
+ ---
16
+
17
+ # 🪐 DiffReaper-5L
18
+
19
+ DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**. This model is under **active autonomous training** on an H100.
20
+
21
+ ## 🔬 Model Details
22
+
23
+ - **Architecture:** 24-layer Custom Transformer with Time Embedding.
24
+ - **Task:** Conditioned Text Diffusion (Prompt-Response).
25
+ - **Training Objective:** Cosine Similarity Regression.
26
+ - **Sampling:** 10-step iterative parallel denoising.
27
+
28
+ ## 🚀 Autonomous Training State
29
+
30
+ The model is training autonomously on an H100 with the following configuration:
31
+ - **Batch Size:** 16.
32
+ - **Learning Rate:** 1e-4.
33
+ - **Checkpointing:** Saves `diffreaper5l_{step}.pt` every 2,500 steps to [darwinkernelpanic/DiffReaper-5L](https://huggingface.co/darwinkernelpanic/DiffReaper-5L).
34
+
35
+ ## 🛠️ Usage (Inference)
36
+
37
+ To run inference:
38
+
39
+ ```python
40
+ import torch
41
+ # Assuming DiffReaperModel is defined as in train_diffreaper_5l.py
42
+
43
+ model = DiffReaperModel(vocab_size=50257, n_embd=2048, n_head=32, n_layer=24).to("cuda")
44
+ model.load_state_dict(torch.load("diffreaper5l_latest.pt"))
45
+ model.eval()
46
+ ```
47
+
48
+ ## 🎯 Fine-tuning
49
+
50
+ To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss.
51
+
52
+ *Created by Darwin & Clawd.*