kl1
/

DFM-1.3B

Text Generation

Model card Files Files and versions

DFM-1.3B / README.md

kl1's picture

Update README.md

4641771 verified 30 days ago

|

history blame contribute delete

2.09 kB

	---
	license: other
	tags:
	- non-commercial
	- text-generation
	- flow-matching
	datasets:
	- cerebras/SlimPajama-627B
	---

	# DFM

	## Summary
	`DFM` is a continued-pretraining checkpoint based on Apple's fs-dfm weights. It is trained with Flow Matching code and released for research/non-commercial use only.
	This model was continued from a uniform‑noise trained checkpoint to a masked‑diffusion variant.
	Base checkpoint (external, not on HF):
	```
	https://ml-site.cdn-apple.com/models/fs-dfm/checkpoint.pth
	```

	## Training
	- Continued pretraining from Apple's fs-dfm checkpoint. Init: uniform‑noise checkpoint → continued training to mask‑diffusion
	- Dataset: SlimPajama-627B
	- Steps: 250,000
	- Global batch size: 256

	## License
	Research/non-commercial use only. This repository is governed by the Apple Software License (see `LICENSE`) and includes non-commercial restrictions inherited from Flow Matching (CC BY-NC 4.0). See `ACKNOWLEDGMENTS` for third-party notices.

	## Intended Use
	Research and non-commercial use only.

	## Limitations
	Commercial use is not permitted. Dataset-specific licensing constraints apply to SlimPajama's underlying sources.

	## Usage
	### Hugging Face (trust_remote_code)
	This repo provides `configuration_dfm.py` and `modeling_dfm.py` for HF loading with `trust_remote_code=True`.

	Example:
	```python
	from transformers import AutoConfig, AutoModel

	config = AutoConfig.from_pretrained(".", trust_remote_code=True)
	model = AutoModel.from_pretrained(".", trust_remote_code=True)
	```

	Note:
	- This model expects `x_t` and `time` inputs (flow-matching style), not GPT-style autoregressive inputs.

	This release includes model-only weights (`model.safetensors`) for inference/forward passes. Full training/eval/sampling code is available in the original project: `https://github.com/apple/ml-fs-dfm`.

	## Acknowledgments
	This model is derived from Apple's fs-dfm checkpoint and follows the original Apple license terms. The original project is at `https://github.com/apple/ml-fs-dfm`. See `ACKNOWLEDGMENTS` for third-party attributions and licensing.