--- language: en license: mit tags: - qec - surface-code - quantum - pytorch - quantum-error-correction - neural-decoder - bivariate-bicycle - ldpc - transfer-learning - foundation-decoder pipeline_tag: other --- # NTU Neural Decoder Checkpoints Pre-trained model weights for the neural decoders introduced in **Neural Transfer Unification (NTU)**, an architecture-agnostic transfer-learning framework for scalable quantum error correction. 📄 **Paper**: *Efficient Foundation Decoders for Fault-Tolerant Quantum Computing* 🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/) 💻 **Source code**: [https://github.com/GrahamYan/ntu-decoder](https://github.com/GrahamYan/ntu-decoder) --- ## Overview NTU exploits the algebraic scale invariance of structured QEC code families to transfer error knowledge from small codes to large-scale fault-tolerant regimes, eliminating the cold-start optimization barrier. The framework is instantiated with two backbone architectures: | Backbone | Description | Code families | |---|---|---| | **NTU-Transformer** | Interleaved RNN-Transformer with 2D RoPE and cross-attention readout | Surface, BB | | **NTU-Neural-BP** | Graph-neural belief propagation on the code Tanner graph | BB | For planar surface codes under circuit-level depolarizing noise, NTU-Transformer surpasses standard PyMatching at *d* = 25 within a ~10³ GPU-hour training budget. For the [[72, 12, 6]] bivariate-bicycle (BB) code, it outperforms BP+OSD across all tested physical error rates and is competitive with multi-stage Relay BP. Transfer from [[72, 12, 6]] to [[144, 12, 12]] reaches 93.1% block accuracy within 2,500 steps (NTU-Transformer) and 95.3% within 500 steps (NTU-Neural-BP). --- ## Repository Structure ``` ntu-surface-code-decoder/ ├── README.md ├── surface/ ← Surface code (NTU-Transformer) │ ├── d7.pth (121 MB, trained from scratch) │ ├── d11.pth (121 MB, transferred from d=7) │ ├── d15.pth (121 MB, transferred from d=11) │ ├── d19.pth (121 MB, transferred from d=15) │ ├── d23.pth (121 MB, transferred from d=19) │ └── d25.pth (122 MB, transferred from d=23) └── bb/ ← BB code checkpoints ├── bb72_transformer.pt (138 MB, NTU-Transformer, [[72,12,6]]) └── neural_bp_bb72.pt (1.2 MB, NTU-Neural-BP, [[72,12,6]]) ``` Each surface code checkpoint contains `model_state` (OrderedDict of weights), `d` (code distance), `rounds` (syndrome extraction rounds), and `step` (training step). BB Transformer checkpoints additionally include `block_acc` and `output_convention` metadata. NTU-Neural-BP checkpoints store the raw `state_dict` directly. --- ## Usage ### NTU-Transformer — Surface code ```python import torch from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="Dreamworldsmile/ntu-surface-code-decoder", filename="surface/d7.pth", ) ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False) model.load_state_dict( {k.replace("_orig_mod.", "").replace("module.", ""): v for k, v in ckpt["model_state"].items()}, strict=False, ) ``` ### NTU-Transformer — BB code ```python ckpt_path = hf_hub_download( repo_id="Dreamworldsmile/ntu-surface-code-decoder", filename="bb/bb72_transformer.pt", ) ckpt = torch.load(ckpt_path, map_location="cpu") state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v for k, v in ckpt["model_state"].items()} # Filter to keys present in the target model (skip logical_readout_bias). model_sd = model.state_dict() filtered = {k: v for k, v in state_dict.items() if k in model_sd and model_sd[k].shape == v.shape and k != "logical_readout_bias"} model.load_state_dict(filtered, strict=False) ``` ### NTU-Neural-BP — BB code ```python ckpt_path = hf_hub_download( repo_id="Dreamworldsmile/ntu-surface-code-decoder", filename="bb/neural_bp_bb72.pt", ) ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=True) state_dict = {k.replace("module.", ""): v for k, v in ckpt.items()} model.load_state_dict(state_dict, strict=True) ``` ### Inference with the official code ```bash git clone https://github.com/GrahamYan/ntu-decoder.git cd ntu-decoder # Surface code. bash inference.sh --code surface --d 7 \ --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 # BB code — NTU-Transformer. bash inference.sh --code bb --model transformer --block_size 72 \ --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005 # BB code — NTU-Neural-BP. bash inference.sh --code bb --model neural_bp --block_size 72 \ --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005 ``` --- ## Model Architecture ### NTU-Transformer The Transformer-based decoder combines standard multi-head self-attention blocks with two QEC-specific components: - **Scalable STEM embedding** — Encodes syndrome data from variable-size lattices into a shared latent representation, absorbing the distance dependence into the input encoding (Eq. 2 in the paper). - **QEC-aware 2D RoPE** — Applies rotary position embeddings to relative algebraic displacements defined by the code's shift set *M*(*x*, *y*, *t*; *d*), preserving detector geometry across code distances. - **Interleaved RNN-Transformer backbone** — 5 GRU-based recurrent blocks alternating with 6 spatial self-attention blocks. - **Cross-attention logical readout** — Learnable logical query tokens attend over the encoded detector representations. ### NTU-Neural-BP A graph-neural-network decoder operating on the bipartite Tanner graph: - **Message passing** between variable and check nodes with gated recurrent units (GRU) for message updates. - **Syndrome-aware encoding** of check node states and prior LLRs. - **Focal loss** with syndrome consistency regularization. - Compact model (~300K parameters for the [[72, 12, 6]] code). --- ## Authors [Ge Yan](https://grahamyan.github.io)1, Shanchuan Li1, 2, Shiyi Xiao1, 3, Pengyue Ma1, Hanyan Cao4, [Feng Pan](https://scholar.google.com/citations?user=Vp6hFhUAAAAJ)4,\*, [Yuxuan Du](https://yuxuan-du.github.io)1,\* 1 College of Computing and Data Science, Nanyang Technological University, Singapore
2 Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Japan
3 School of Artificial Intelligence, Shanghai Jiao Tong University, China
4 Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Singapore \* Corresponding authors --- ## Citation ```bibtex @article{ntu2026, title={Efficient Foundation Decoders for Fault-Tolerant Quantum Computing}, author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and Cao, Hanyan and Pan, Feng and Du, Yuxuan}, year={2026}, } ``` --- ## License This repository is released under the [MIT License](https://opensource.org/licenses/MIT).