DFM-1.3B / README.md
kl1's picture
Update README.md
4641771 verified
metadata
license: other
tags:
  - non-commercial
  - text-generation
  - flow-matching
datasets:
  - cerebras/SlimPajama-627B

DFM

Summary

DFM is a continued-pretraining checkpoint based on Apple's fs-dfm weights. It is trained with Flow Matching code and released for research/non-commercial use only. This model was continued from a uniform‑noise trained checkpoint to a masked‑diffusion variant. Base checkpoint (external, not on HF):

https://ml-site.cdn-apple.com/models/fs-dfm/checkpoint.pth

Training

  • Continued pretraining from Apple's fs-dfm checkpoint. Init: uniform‑noise checkpoint → continued training to mask‑diffusion
  • Dataset: SlimPajama-627B
  • Steps: 250,000
  • Global batch size: 256

License

Research/non-commercial use only. This repository is governed by the Apple Software License (see LICENSE) and includes non-commercial restrictions inherited from Flow Matching (CC BY-NC 4.0). See ACKNOWLEDGMENTS for third-party notices.

Intended Use

Research and non-commercial use only.

Limitations

Commercial use is not permitted. Dataset-specific licensing constraints apply to SlimPajama's underlying sources.

Usage

Hugging Face (trust_remote_code)

This repo provides configuration_dfm.py and modeling_dfm.py for HF loading with trust_remote_code=True.

Example:

from transformers import AutoConfig, AutoModel

config = AutoConfig.from_pretrained(".", trust_remote_code=True)
model = AutoModel.from_pretrained(".", trust_remote_code=True)

Note:

  • This model expects x_t and time inputs (flow-matching style), not GPT-style autoregressive inputs.

This release includes model-only weights (model.safetensors) for inference/forward passes. Full training/eval/sampling code is available in the original project: https://github.com/apple/ml-fs-dfm.

Acknowledgments

This model is derived from Apple's fs-dfm checkpoint and follows the original Apple license terms. The original project is at https://github.com/apple/ml-fs-dfm. See ACKNOWLEDGMENTS for third-party attributions and licensing.