Qwen3-8B-a2d-init / README.md
shubhamprshr's picture
Improve model card: add metadata and official links (#1)
05eaa32
metadata
license: apache-2.0
base_model: Qwen/Qwen3-8B
pipeline_tag: text-generation

Qwen3-8B-A2D-untrained-dllm-convert

This repository contains the untrained initialization of Qwen3-8B converted to the A2D architecture (bidirectional attention), as introduced in the paper Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation.

Model Details

Qwen3-8B converted to A2D architecture (bidirectional attention) using dllm convert pipeline.

  • Architecture: A2D-Qwen3 (non-causal attention, same weights as original)
  • Parameters: 8.19B
  • Vocab size: 151936
  • Model type: a2d-qwen3

This model has the original Qwen3-8B weights with bidirectional (non-causal) attention. No diffusion pretraining or SFT has been applied.

Mask token registration: The mask token <|MASK|> (ID 151669) is registered in the tokenizer for use with diffusion-based language modeling. The original Qwen3 tokenizer includes <|MASK|> in special_tokens_map.json but does not register it in tokenizer_config.json, so tokenizer.mask_token_id returns None. We fixed this by adding <|MASK|> to the added_tokens_decoder section and the mask_token field in tokenizer_config.json, and adding the full mask_token entry in special_tokens_map.json. After this fix, tokenizer.mask_token_id correctly returns 151669.