darwinkernelpanic commited on
Commit
3bce0dd
·
verified ·
1 Parent(s): 5321c54

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -5
README.md CHANGED
@@ -34,11 +34,42 @@ This model is currently in **Autonomous Growth Mode**. It is training on an RTX
34
  - **Optimizer:** AdamW with a learning rate of 1e-4.
35
  - **Sync:** Auto-checkpointing every 2,500 steps to this repository.
36
 
37
- ## 🛠️ Intended Use
38
 
39
- DiffReaper-5 is intended for research into **Non-Autoregressive Generation**. Its primary strengths are:
40
- 1. **Speed:** Parallel token generation eliminates the KV-cache bottleneck.
41
- 2. **Coherence:** Focuses on global sequence structure rather than next-token probability.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## 📈 Diagnostic: Cropmark
44
 
@@ -47,4 +78,4 @@ The model's progress is monitored via the **Cropmark Diagnostic**.
47
  - Results are logged in `checkpoint_log.txt` and uploaded periodically.
48
 
49
  ---
50
- *Created by Darwin (Oscar) & Clawd.*
 
34
  - **Optimizer:** AdamW with a learning rate of 1e-4.
35
  - **Sync:** Auto-checkpointing every 2,500 steps to this repository.
36
 
37
+ ## 🛠️ Usage (Inference)
38
 
39
+ Unlike autoregressive models, DiffReaper-5 generates the entire response in parallel through iterative denoising. Use the following logic to run inference:
40
+
41
+ ```python
42
+ import torch
43
+ import torch.nn.functional as F
44
+
45
+ def generate(model, tokenizer, prompt, steps=10):
46
+ model.eval()
47
+ with torch.no_grad():
48
+ p_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
49
+ p_emb = model.token_embedding(p_tokens[:, :32]) # Hard conditioning
50
+
51
+ # Start from pure noise
52
+ r_noise = torch.randn(1, 32, 1024).to("cuda")
53
+
54
+ for i in range(steps):
55
+ t = torch.tensor([1000 - (i * (1000//steps)) - 1], device="cuda").long()
56
+ pred = model(torch.cat([p_emb, r_noise], dim=1), t)
57
+ r_0_pred = pred[:, 32:, :] # Extract response
58
+ r_noise = 0.4 * r_noise + 0.6 * r_0_pred # Iterative refinement
59
+
60
+ # Map to vocab using Cosine Similarity
61
+ norm_weights = F.normalize(model.token_embedding.weight, dim=-1)
62
+ norm_r = F.normalize(r_noise, dim=-1)
63
+ logits = torch.matmul(norm_r, norm_weights.T)
64
+ return tokenizer.decode(torch.argmax(logits, dim=-1)[0])
65
+ ```
66
+
67
+ ## 🎯 Fine-tuning
68
+
69
+ To fine-tune DiffReaper-5 on a custom dataset:
70
+ 1. **Objective:** Use `1 - F.cosine_similarity` between predicted and target embeddings.
71
+ 2. **Conditioning:** Ensure your data loader provides a fixed-length prompt prefix followed by the target response.
72
+ 3. **Architecture:** Maintain the 1024-dimensional latent space to stay compatible with the weights.
73
 
74
  ## 📈 Diagnostic: Cropmark
75
 
 
78
  - Results are logged in `checkpoint_log.txt` and uploaded periodically.
79
 
80
  ---
81
+ *Created by Darwin & Clawd.*