| --- |
| language: |
| - en |
| tags: |
| - dllm |
| - diffusion-language-model |
| - text-generation |
| - diffusion |
| - language-model |
| license: apache-2.0 |
| --- |
| |
| # HDLM-Epsilon: Hybrid Diffusion Language Model |
|
|
| [](https://arxiv.org/abs/2504.06416) |
| [](https://github.com/ServiceNow/hdlm) |
|
|
| This model card is for the **hdlm-base model with epsilon=0.0** |
|
|
| ## Model Description |
|
|
| HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms. |
|
|
| The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise. |
|
|
| ## Model Architecture |
|
|
| - **Base Model**: Transformer architecture with custom conditioning layers |
| - **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token) |
| - **Context Length**: 1024 tokens |
| - **Training**: Hybrid loss combining token masking with random token corruption |
| - **Inference**: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler) |
|
|
| ## Usage |
|
|
| ### Quick Start |
|
|
| ```python |
| from hdlm.hf_utils import smart_model_loader |
| from hdlm.epsilon_hybrid.sample import full_diff |
| from transformers import GPT2TokenizerFast |
| import torch |
| |
| # Load model using smart loader (automatically detects model type) |
| model, cfg, device, accelerator, metaschedule = smart_model_loader( |
| model_path="hdlm-group/hdlm-base-epsilon-0.0", |
| model_type="auto", # automatically detects epsilon_hybrid |
| device="cuda" |
| ) |
| |
| # Load tokenizer |
| tokenizer = GPT2TokenizerFast.from_pretrained('gpt2') |
| |
| # Generate text |
| prompt = "The future of artificial intelligence" |
| prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device) |
| |
| # Full diffusion sampling |
| generated = full_diff( |
| model=model, |
| prompt=prompt_ids, |
| batch_size=1, |
| alg='acs', # or 'original', 'remask', 'remdm' |
| steps=512, |
| temperature=1.0, |
| context_length=1024, |
| device=device |
| ) |
| |
| # Decode generated text |
| generated_text = tokenizer.decode(generated[0], skip_special_tokens=True) |
| print(generated_text) |
| ``` |
|
|
| ### Evaluation |
|
|
| ```bash |
| # Text generation evaluation |
| python hdlm/eval_generation.py \ |
| --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \ |
| --sampling_method full_diff \ |
| --algorithm acs \ |
| --save_samples |
| |
| # Perplexity evaluation |
| python hdlm/eval_modeling.py \ |
| --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \ |
| --work_dir "./logs/eval_modeling_epsilon" \ |
| --dataset ptb |
| ``` |
|
|
| ## Training Details |
|
|
| - **Dataset**: OpenWebText |
| - **Batch Size**: 512 |
| - **Learning Rate**: 3e-4 with cosine scheduling |
| - **Epsilon (ε)**: 0.01 (controls hybrid noising blend) |
| - **Lambda (λ)**: 1.0 (weighting factor for unmasked tokens) |
| - **Loss Type**: Hybrid loss combining masking and random token corruption |
| - **Training Steps**: 1M iterations |
| - **Warmup**: 50K steps |
|
|
| ## Sampling Algorithms |
|
|
| The model supports several sampling algorithms: |
|
|
| - **`original`**: Standard diffusion sampling |
| - **`acs`**: Adaptive Correction Sampler with error correction |
| - **`remask`**: Remasking strategy for improved quality |
| - **`remdm`**: ReMDM-style sampling with probability mixing |
|
|
| ## Model Variants |
|
|
| Available epsilon values and their characteristics: |
|
|
| - **ε = 0.01**: Minimal uniform noise, closest to pure absorbing process |
| - **ε = 0.1**: Moderate hybrid behavior |
| - **ε = 0.5**: Balanced absorbing-uniform blend |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{fathi2025unifying, |
| title={Unifying autoregressive and diffusion-based sequence generation}, |
| author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}}, |
| journal={arXiv preprint arXiv:2504.06416}, |
| year={2025} |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details. |
|
|