| --- |
| language: |
| - en |
| tags: |
| - dllm |
| - diffusion-language-model |
| - text-generation |
| - diffusion |
| - language-model |
| license: apache-2.0 |
| --- |
| |
| # HDLM-Gamma: Hybrid Diffusion Language Model |
|
|
| [](https://arxiv.org/abs/2504.06416) |
| [](https://github.com/ServiceNow/hdlm) |
|
|
| This is the model card for **dlm-group/hdlm-base-gamma-0.01**. |
|
|
| ## Model Description |
|
|
| HDLM-Gamma is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through gamma-hybrid noising. This model interpolates transition operators between absorbing and uniform processes, making it conceptually closer to SEDD (Lou et al. 2024) while maintaining the benefits of both paradigms. |
|
|
| The gamma parameter (γ) controls the blend between absorbing and uniform transition matrices: Q_gamma = (1-γ) * Q_absorb + γ * Q_uniform, where smaller values emphasize the absorbing process and larger values incorporate more uniform transitions. |
| |
| ## Model Architecture |
| |
| - **Base Model**: Transformer architecture with staggered score conditioning |
| - **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token) |
| - **Context Length**: Variable (supports up to 2048 tokens) |
| - **Training**: Continuous-time diffusion with gamma-hybrid graph structure |
| - **Inference**: Analytic predictor with staggered score computation |
| |
| ## Usage |
| |
| ### Quick Start |
| |
| ```python |
| from hdlm.hf_utils import smart_model_loader |
| from hdlm.gamma_hybrid.sampling import get_sa_sampling_fn |
| from transformers import GPT2TokenizerFast |
| import torch |
|
|
| # Load model using smart loader (automatically detects model type) |
| model, cfg, device, accelerator, metaschedule = smart_model_loader( |
| model_path="hdlm-group/hdlm-base-gamma-0.01", |
| model_type="auto", # automatically detects gamma_hybrid |
| device="cuda" |
| ) |
| |
| # Load tokenizer |
| tokenizer = GPT2TokenizerFast.from_pretrained('gpt2') |
| |
| # Generate text |
| prompt = "The future of artificial intelligence" |
| prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device) |
| |
| # Configure sampling function (automatically set up from config) |
| sampling_fn = get_sa_sampling_fn( |
| config=cfg, |
| graph=None, # Will be created from config |
| noise=None, # Will be created from config |
| meta_schedule=metaschedule, |
| batch_dims=(1,), |
| eps=1e-4, |
| device=device |
| ) |
| |
| # Generate samples |
| generated = sampling_fn( |
| model=model, |
| prompt=prompt_ids, |
| context_length=1024 |
| ) |
| |
| # Decode generated text |
| generated_text = tokenizer.decode(generated[0], skip_special_tokens=True) |
| print(generated_text) |
| ``` |
| |
| ### Evaluation |
| |
| ```bash |
| # Text generation evaluation |
| python hdlm/eval_generation.py \ |
| --checkpoint_path hdlm-group/hdlm-base-gamma-0.01 \ |
| --sampling_method SAR \ |
| --save_samples |
| |
| # Perplexity evaluation |
| python hdlm/eval_modeling.py \ |
| --checkpoint_path hdlm-group/hdlm-base-gamma-0.01 \ |
| --work_dir "./logs/eval_modeling_gamma" \ |
| --dataset ptb |
| ``` |
| |
| ## Training Details |
|
|
| - **Dataset**: OpenWebText |
| - **Batch Size**: 256 |
| - **Learning Rate**: 3e-4 with lambda scheduling |
| - **Gamma (γ)**: 0.01 (controls hybrid transition blend) |
| - **Graph Type**: QGamma with expanded sigma conditioning |
| - **Noise Schedule**: Log-linear (σ_min=1e-4, σ_max=10.0) |
| - **Training Steps**: 1M iterations |
| - **Warmup**: 50K steps |
|
|
| ## Key Components |
|
|
| ### Graph Structure |
| The QGamma graph combines absorbing and uniform transition matrices: |
| - **Absorbing component**: Transitions to absorbing state (mask token) |
| - **Uniform component**: Uniform transitions between all tokens |
| - **Hybrid blend**: Controlled by gamma parameter |
|
|
| ### Staggered Score |
| The model uses staggered score computation that applies different transformations to absorbing and uniform branches before combining them, enabling more flexible generation patterns. |
|
|
| ### Sampling Strategy |
| - **Predictor**: Analytic predictor with exact transition computation |
| - **Strategy**: Direct sampling with configurable strategy parameter |
| - **Noise Removal**: Optional final denoising step |
|
|
| ## Model Variants |
|
|
| Available gamma values and their characteristics: |
|
|
| - **γ = 0.01**: Minimal uniform transitions, closest to pure absorbing process |
| - **γ = 0.1**: Moderate hybrid behavior with increased uniform mixing |
| - **γ = 0.5**: Balanced absorbing-uniform transition blend |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{fathi2025unifying, |
| title={Unifying autoregressive and diffusion-based sequence generation}, |
| author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}}, |
| journal={arXiv preprint arXiv:2504.06416}, |
| year={2025} |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details. |
|
|