| | --- |
| | license: other |
| | tags: |
| | - non-commercial |
| | - text-generation |
| | - flow-matching |
| | datasets: |
| | - cerebras/SlimPajama-627B |
| | --- |
| | |
| | # DFM |
| |
|
| | ## Summary |
| | `DFM` is a continued-pretraining checkpoint based on Apple's fs-dfm weights. It is trained with Flow Matching code and released for research/non-commercial use only. |
| | This model was continued from a uniform‑noise trained checkpoint to a masked‑diffusion variant. |
| | Base checkpoint (external, not on HF): |
| | ``` |
| | https://ml-site.cdn-apple.com/models/fs-dfm/checkpoint.pth |
| | ``` |
| |
|
| | ## Training |
| | - Continued pretraining from Apple's fs-dfm checkpoint. Init: uniform‑noise checkpoint → continued training to mask‑diffusion |
| | - Dataset: SlimPajama-627B |
| | - Steps: 250,000 |
| | - Global batch size: 256 |
| |
|
| | ## License |
| | Research/non-commercial use only. This repository is governed by the Apple Software License (see `LICENSE`) and includes non-commercial restrictions inherited from Flow Matching (CC BY-NC 4.0). See `ACKNOWLEDGMENTS` for third-party notices. |
| |
|
| | ## Intended Use |
| | Research and non-commercial use only. |
| |
|
| | ## Limitations |
| | Commercial use is not permitted. Dataset-specific licensing constraints apply to SlimPajama's underlying sources. |
| |
|
| | ## Usage |
| | ### Hugging Face (trust_remote_code) |
| | This repo provides `configuration_dfm.py` and `modeling_dfm.py` for HF loading with `trust_remote_code=True`. |
| |
|
| | Example: |
| | ```python |
| | from transformers import AutoConfig, AutoModel |
| | |
| | config = AutoConfig.from_pretrained(".", trust_remote_code=True) |
| | model = AutoModel.from_pretrained(".", trust_remote_code=True) |
| | ``` |
| |
|
| | Note: |
| | - This model expects `x_t` and `time` inputs (flow-matching style), not GPT-style autoregressive inputs. |
| |
|
| | This release includes model-only weights (`model.safetensors`) for inference/forward passes. Full training/eval/sampling code is available in the original project: `https://github.com/apple/ml-fs-dfm`. |
| |
|
| | ## Acknowledgments |
| | This model is derived from Apple's fs-dfm checkpoint and follows the original Apple license terms. The original project is at `https://github.com/apple/ml-fs-dfm`. See `ACKNOWLEDGMENTS` for third-party attributions and licensing. |
| |
|