philipp-zettl
/

modernbert-diffusion-refactor

Model card Files Files and versions

modernbert-diffusion-refactor / README.md

philipp-zettl's picture

Upload folder using huggingface_hub

759d29b verified 5 days ago

|

history blame contribute delete

1.85 kB

	---
	language: en
	tags:
	- mask-predict
	- diffusion
	- masked-lm
	library_name: transformers
	base_model: philipp-zettl/modernbert-diffusion-universal
	pipeline_tag: fill-mask
	---

	# ./refinebert-refactor

	## Model Summary
	A diffusion-style masked language model fine-tuned from `philipp-zettl/modernbert-diffusion-universal` on the `custom` dataset.

	## Model Details
	- Model ID: ./refinebert-refactor
	- Base model: philipp-zettl/modernbert-diffusion-universal
	- Training mode: Fine-tuning
	- Task type: Masked token denoising / diffusion-style infilling

	## Intended Use
	Intended for tasks related to the custom training data.

	Example
	```python
	from refinebert.diffusion_engine import MaskedDiffusionEngine

	engine = MaskedDiffusionEngine("./refinebert-refactor")
	prompt = "N/A (See generation logs)"
	output = engine.generate(prompt, num_new_tokens=N/A, steps=N/A, guidance_scale=N/A)
	print(output)
	```

	## Training Data
	Single-dataset fine-tuning.

	### Dataset Mix
	\| Custom Files \| 100% \| code_refactoring.txt \|

	Fine-tuned on user-provided local text files.

	## Training Procedure
	- Steps: 1731
	- Batch size: 16
	- Sequence length: 256
	- Learning rate: 5e-05
	- CFG dropout probability: N/A
	- Samples loaded into RAM: N/A

	## Training Time & Hardware
	- Duration: 0h 10m 25s
	- Hardware: NVIDIA GeForce RTX 4070 Laptop GPU x1 (CUDA available)

	## Metrics (Training)
	\| Metric \| Value \|
	\| --- \| --- \|
	\| Training Loss \| 2.0958 \|
	\| Epochs \| 3 \|
	\| Global Step \| 1731 \|

	## Limitations & Considerations
	- The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
	- Data sources may have licensing or content constraints—review source dataset cards before deployment.
	- Performance can vary substantially by mode (Fine-tuning) and prompt structure.