Instructions to use Zigeng/DMax-Coder-16B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Zigeng/DMax-Coder-16B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Zigeng/DMax-Coder-16B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Zigeng/DMax-Coder-16B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Zigeng/DMax-Coder-16B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Zigeng/DMax-Coder-16B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zigeng/DMax-Coder-16B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Zigeng/DMax-Coder-16B

SGLang

How to use Zigeng/DMax-Coder-16B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Zigeng/DMax-Coder-16B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zigeng/DMax-Coder-16B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Zigeng/DMax-Coder-16B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zigeng/DMax-Coder-16B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Zigeng/DMax-Coder-16B with Docker Model Runner:
```
docker model run hf.co/Zigeng/DMax-Coder-16B
```

nielsr HF Staff commited on Apr 10

Commit

f5b5fd6

verified ·

1 Parent(s): 8489383

Improve model card metadata and content

Browse files

This PR improves the model card by:
- Adding the `pipeline_tag: text-generation` for better discoverability.
- Adding `library_name: transformers` metadata as the model is compatible with the library via `trust_remote_code=True`.
- Linking the model to its associated paper [DMax: Aggressive Parallel Decoding for dLLMs](https://huggingface.co/papers/2604.08302).
- Updating the README with highlights, a "Quick Start" usage example, and the BibTeX citation from the official repository.

Files changed (1) hide show

README.md +29 -14

README.md CHANGED Viewed

@@ -1,9 +1,11 @@
 ---
-license: apache-2.0
-datasets:
-- Zigeng/DMax-LLaDA-2.0-Mini-Code-Trajectories
 base_model:
 - inclusionAI/LLaDA2.0-mini
 ---
 <div align="center">
@@ -12,7 +14,7 @@ base_model:
   <a href="https://github.com/czg1225/DMax/blob/main/LICENSE">
     <img alt="Apache" src="https://img.shields.io/badge/License-Apache-4E94CE.svg">
   </a>
-  <a href="https://arxiv.org/pdf/2604.08302">
     <img src="https://img.shields.io/badge/Paper-Arxiv-darkred.svg" alt="Paper">
   </a>
   <a href="https://github.com/czg1225/DMax">
@@ -21,10 +23,7 @@ base_model:
 </div>
 </div>
-> **DMax: Aggressive Parallel Decoding for dLLMs**
-> [Zigeng Chen](https://czg1225.github.io/chenzigeng99/), [Gongfan Fang](https://fangggf.github.io/), [Xinyin Ma](https://horseee.github.io/), [Ruonan Yu](https://scholar.google.com/citations?user=UHP95egAAAAJ&hl=en), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
-> [xML Lab](https://sites.google.com/view/xml-nus), National University of Singapore
 ## 💪 Highlights
@@ -33,7 +32,7 @@ base_model:
 - **Soft Parallel Decoding**: Uses interpolation between mask and token embeddings to propagate confidence priors from previous steps.
 <div align="center">
-  <img src="assets/tradeoff.png" width="100%" />
   <br>
   <em>Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.</em>
 </div>
@@ -66,7 +65,14 @@ model = model.to(torch.bfloat16)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Coder-16B", trust_remote_code=True)
-prompt = "Write a python function to find the first repeated character in a given string." + "\n\nPlease enclose your code within delimiters as follows:\n```python\n# YOUR CODE HERE\n```\n\n"
 input_ids = tokenizer.apply_chat_template(
     [{"role": "user", "content": prompt}],
@@ -91,7 +97,16 @@ print(generated_answer)
 print("nfe:",nfe,"token length",len(generated_tokens[0]))
 ```
-## 📖 Experimental Results
-![trade-off](assets/exp.png)

 ---
 base_model:
 - inclusionAI/LLaDA2.0-mini
+datasets:
+- Zigeng/DMax-LLaDA-2.0-Mini-Code-Trajectories
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 <div align="center">
   <a href="https://github.com/czg1225/DMax/blob/main/LICENSE">
     <img alt="Apache" src="https://img.shields.io/badge/License-Apache-4E94CE.svg">
   </a>
+  <a href="https://huggingface.co/papers/2604.08302">
     <img src="https://img.shields.io/badge/Paper-Arxiv-darkred.svg" alt="Paper">
   </a>
   <a href="https://github.com/czg1225/DMax">
 </div>
 </div>
+DMax is a new paradigm for efficient diffusion language models (dLLMs) that enables aggressive decoding parallelism while preserving generation quality. This repository contains the **DMax-Coder-16B** model, specialized for highly parallel code generation.
 ## 💪 Highlights
 - **Soft Parallel Decoding**: Uses interpolation between mask and token embeddings to propagate confidence priors from previous steps.
 <div align="center">
+  <img src="https://github.com/czg1225/DMax/raw/main/assets/tradeoff.png" width="100%" />
   <br>
   <em>Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.</em>
 </div>
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Coder-16B", trust_remote_code=True)
+prompt = "Write a python function to find the first repeated character in a given string." + "
+Please enclose your code within delimiters as follows:
+```python
+# YOUR CODE HERE
+```
+"
 input_ids = tokenizer.apply_chat_template(
     [{"role": "user", "content": prompt}],
 print("nfe:",nfe,"token length",len(generated_tokens[0]))
 ```
+## 📖 Citation
+```bibtex
+@misc{chen2026dmaxaggressiveparalleldecoding,
+      title={DMax: Aggressive Parallel Decoding for dLLMs},
+      author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Ruonan Yu and Xinchao Wang},
+      year={2026},
+      eprint={2604.08302},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2604.08302},
+}
+```