Update README.md
Browse files
README.md
CHANGED
|
@@ -12,14 +12,14 @@ datasets:
|
|
| 12 |
|
| 13 |
## Summary
|
| 14 |
`DFM` is a continued-pretraining checkpoint based on Apple's fs-dfm weights. It is trained with Flow Matching code and released for research/non-commercial use only.
|
| 15 |
-
|
| 16 |
Base checkpoint (external, not on HF):
|
| 17 |
```
|
| 18 |
https://ml-site.cdn-apple.com/models/fs-dfm/checkpoint.pth
|
| 19 |
```
|
| 20 |
|
| 21 |
## Training
|
| 22 |
-
- Continued pretraining from Apple's fs-dfm checkpoint
|
| 23 |
- Dataset: SlimPajama-627B
|
| 24 |
- Steps: 250,000
|
| 25 |
- Global batch size: 256
|
|
|
|
| 12 |
|
| 13 |
## Summary
|
| 14 |
`DFM` is a continued-pretraining checkpoint based on Apple's fs-dfm weights. It is trained with Flow Matching code and released for research/non-commercial use only.
|
| 15 |
+
This model was continued from a uniform‑noise trained checkpoint to a masked‑diffusion variant.
|
| 16 |
Base checkpoint (external, not on HF):
|
| 17 |
```
|
| 18 |
https://ml-site.cdn-apple.com/models/fs-dfm/checkpoint.pth
|
| 19 |
```
|
| 20 |
|
| 21 |
## Training
|
| 22 |
+
- Continued pretraining from Apple's fs-dfm checkpoint. Init: uniform‑noise checkpoint → continued training to mask‑diffusion
|
| 23 |
- Dataset: SlimPajama-627B
|
| 24 |
- Steps: 250,000
|
| 25 |
- Global batch size: 256
|