Dusk-8B-INT4

by Qubitron Labs

Dusk-8B-INT4 is a 4-bit quantized version of Dusk-8B β€” a Masked Diffusion Language Model (MDM) capable of high-quality, coherent text generation.

Unlike autoregressive LLMs (GPT, Llama, etc.) that generate text left-to-right one token at a time, Dusk generates all tokens in parallel through iterative denoising. This enables global planning, bidirectional reasoning, and holistic output generation.

This INT4 model was quantized using optimum-quanto, making it device-agnostic β€” it runs on CPU, CUDA, and Apple Silicon MPS without any dequantization spikes.


Model Details

Property Value
Architecture Masked Diffusion Language Model
Parameters 8B
Quantization INT4 (via optimum-quanto)
Base Model GSAI-ML/LLaDA-8B-Instruct
Memory (INT4) ~4–5 GB RAM
Memory (FP16) ~16 GB RAM
License Apache 2.0

Quick Start

Installation

pip install transformers accelerate optimum-quanto sentencepiece

Load & Generate

import torch
from transformers import AutoTokenizer
from optimum.quanto import quantize, freeze, qint4

# --- Load model (you need the models/ package from our repo) ---
# git clone https://github.com/QubitronLabs/dusk && cd dusk
# pip install -r requirements.txt
import sys
sys.path.insert(0, ".")  # project root
from models import DuskModelLM

MODEL_ID = "QubitronLabs/dusk-8b-int4"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = DuskModelLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="cpu",   # change to "cuda" or "mps" if available
    trust_remote_code=True,
)

# Re-apply quantization at runtime
quantize(model, weights=qint4, exclude=["lm_head"])
freeze(model)
model.eval()

Chat Inference

# Import the custom MDM generate function (from generate.py in the repo)
from generate import generate

MASK_ID = 126336  # [MASK] token id

prompt = "Explain the difference between supervised and unsupervised learning."
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(formatted, return_tensors="pt")["input_ids"].to(model.device)

with torch.no_grad():
    out = generate(
        model,
        input_ids,
        steps=128,
        gen_length=128,
        block_length=128,
        temperature=0.,
        cfg_scale=0.,
        remasking="low_confidence",
        mask_id=MASK_ID,
    )

response = tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=True)[0]
print(response)

Why Masked Diffusion?

Feature Autoregressive (GPT, Llama) Dusk (MDM)
Generation direction Left β†’ Right All tokens simultaneously
Can revise earlier tokens ❌ No βœ… Yes
Global planning Limited Native
Bidirectional context Partial (decoder-only) Full
Speed (parallel hardware) Sequential bottleneck Highly parallelisable

Why INT4?

Standard FP16 Dusk-8B requires 16 GB of RAM β€” too large for most consumer hardware.
INT4 quantization via optimum-quanto reduces this to **
4–5 GB** while preserving generation quality.

Why optimum-quanto over bitsandbytes/GPTQ/AWQ?

  • quanto is device-agnostic β€” saved INT4 weights load on CPU, CUDA, and Apple MPS
  • bitsandbytes/GPTQ/AWQ are CUDA-only β€” won't run on Mac
  • No fp16 dequantisation spikes during loading

Hardware Requirements

Hardware Supported Notes
NVIDIA GPU (CUDA) βœ… Use device_map="cuda"
Apple Silicon MPS βœ… Use device_map="mps"
CPU only βœ… Use device_map="cpu" β€” slow but works
Min RAM (INT4) 6 GB 8 GB+ recommended

Citation

If you use this model in your research, please cite:

@misc{dusk2026,
  title        = {Dusk: A Masked Diffusion Language Model},
  author       = {Qubitron Labs},
  year         = {2026},
  url          = {https://huggingface.co/QubitronLabs/dusk-8b-int4}
}

License

Apache License 2.0 β€” see LICENSE for details.

Downloads last month
62
Safetensors
Model size
4B params
Tensor type
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for qubitronlabs/dusk-8b-int4

Finetuned
(23)
this model