|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: openrail |
|
|
tags: |
|
|
- diffusion-llm |
|
|
- parallel-generation |
|
|
- custom-transformer |
|
|
- cropmark |
|
|
datasets: |
|
|
- OpenAssistant/oasst1 |
|
|
metrics: |
|
|
- cosine_similarity |
|
|
base_model: |
|
|
- darwinkernelpanic/DiffReaper-5 |
|
|
--- |
|
|
|
|
|
# DiffReaper-5L |
|
|
|
|
|
DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture:** 24-layer Custom Transformer with Time Embedding. |
|
|
- **Task:** Conditioned Text Diffusion (Prompt-Response). |
|
|
- **Training Objective:** Cosine Similarity Regression. |
|
|
- **Sampling:** 10-step iterative parallel denoising. |
|
|
|
|
|
## Usage (Inference) |
|
|
|
|
|
To run inference: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
# Assuming DiffReaperModel is defined as in train_diffreaper_5l.py |
|
|
|
|
|
model = DiffReaperModel(vocab_size=50257, n_embd=2048, n_head=32, n_layer=24).to("cuda") |
|
|
model.load_state_dict(torch.load("diffreaper5l_latest.pt")) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
## Fine-tuning |
|
|
|
|
|
To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss. |