| --- |
| language: en |
| license: mit |
| tags: |
| - qec |
| - surface-code |
| - quantum |
| - pytorch |
| - quantum-error-correction |
| - neural-decoder |
| - bivariate-bicycle |
| - ldpc |
| - transfer-learning |
| - foundation-decoder |
| pipeline_tag: other |
| --- |
| |
| # NTU Neural Decoder Checkpoints |
|
|
| Pre-trained model weights for the neural decoders introduced in **Neural |
| Transfer Unification (NTU)**, an architecture-agnostic transfer-learning |
| framework for scalable quantum error correction. |
|
|
| π **Paper**: *Efficient Foundation Decoders for Fault-Tolerant Quantum Computing* |
|
|
| π **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/) |
|
|
| π» **Source code**: [https://github.com/GrahamYan/ntu-decoder](https://github.com/GrahamYan/ntu-decoder) |
|
|
| --- |
|
|
| ## Overview |
|
|
| NTU exploits the algebraic scale invariance of structured QEC code families to |
| transfer error knowledge from small codes to large-scale fault-tolerant regimes, |
| eliminating the cold-start optimization barrier. The framework is instantiated |
| with two backbone architectures: |
|
|
| | Backbone | Description | Code families | |
| |---|---|---| |
| | **NTU-Transformer** | Interleaved RNN-Transformer with 2D RoPE and cross-attention readout | Surface, BB | |
| | **NTU-Neural-BP** | Graph-neural belief propagation on the code Tanner graph | BB | |
|
|
| For planar surface codes under circuit-level depolarizing noise, NTU-Transformer |
| surpasses standard PyMatching at *d* = 25 within a ~10Β³β―GPU-hour training budget. |
| For the [[72,β―12,β―6]] bivariate-bicycle (BB) code, it outperforms BP+OSD across |
| all tested physical error rates and is competitive with multi-stage Relay BP. |
| Transfer from [[72,β―12,β―6]] to [[144,β―12,β―12]] reaches 93.1% block accuracy |
| within 2,500 steps (NTU-Transformer) and 95.3% within 500 steps (NTU-Neural-BP). |
|
|
| --- |
|
|
| ## Repository Structure |
|
|
| ``` |
| ntu-surface-code-decoder/ |
| βββ README.md |
| βββ surface/ β Surface code (NTU-Transformer) |
| β βββ d7.pth (121 MB, trained from scratch) |
| β βββ d11.pth (121 MB, transferred from d=7) |
| β βββ d15.pth (121 MB, transferred from d=11) |
| β βββ d19.pth (121 MB, transferred from d=15) |
| β βββ d23.pth (121 MB, transferred from d=19) |
| β βββ d25.pth (122 MB, transferred from d=23) |
| βββ bb/ β BB code checkpoints |
| βββ bb72_transformer.pt (138 MB, NTU-Transformer, [[72,12,6]]) |
| βββ neural_bp_bb72.pt (1.2 MB, NTU-Neural-BP, [[72,12,6]]) |
| ``` |
|
|
| Each surface code checkpoint contains `model_state` (OrderedDict of weights), |
| `d` (code distance), `rounds` (syndrome extraction rounds), and `step` |
| (training step). BB Transformer checkpoints additionally include `block_acc` |
| and `output_convention` metadata. NTU-Neural-BP checkpoints store the raw |
| `state_dict` directly. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### NTU-Transformer β Surface code |
|
|
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| |
| ckpt_path = hf_hub_download( |
| repo_id="Dreamworldsmile/ntu-surface-code-decoder", |
| filename="surface/d7.pth", |
| ) |
| |
| ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False) |
| model.load_state_dict( |
| {k.replace("_orig_mod.", "").replace("module.", ""): v |
| for k, v in ckpt["model_state"].items()}, |
| strict=False, |
| ) |
| ``` |
|
|
| ### NTU-Transformer β BB code |
|
|
| ```python |
| ckpt_path = hf_hub_download( |
| repo_id="Dreamworldsmile/ntu-surface-code-decoder", |
| filename="bb/bb72_transformer.pt", |
| ) |
| |
| ckpt = torch.load(ckpt_path, map_location="cpu") |
| state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v |
| for k, v in ckpt["model_state"].items()} |
| # Filter to keys present in the target model (skip logical_readout_bias). |
| model_sd = model.state_dict() |
| filtered = {k: v for k, v in state_dict.items() |
| if k in model_sd and model_sd[k].shape == v.shape |
| and k != "logical_readout_bias"} |
| model.load_state_dict(filtered, strict=False) |
| ``` |
|
|
| ### NTU-Neural-BP β BB code |
|
|
| ```python |
| ckpt_path = hf_hub_download( |
| repo_id="Dreamworldsmile/ntu-surface-code-decoder", |
| filename="bb/neural_bp_bb72.pt", |
| ) |
| |
| ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=True) |
| state_dict = {k.replace("module.", ""): v for k, v in ckpt.items()} |
| model.load_state_dict(state_dict, strict=True) |
| ``` |
|
|
| ### Inference with the official code |
|
|
| ```bash |
| git clone https://github.com/GrahamYan/ntu-decoder.git |
| cd ntu-decoder |
| |
| # Surface code. |
| bash inference.sh --code surface --d 7 \ |
| --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 |
| |
| # BB code β NTU-Transformer. |
| bash inference.sh --code bb --model transformer --block_size 72 \ |
| --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005 |
| |
| # BB code β NTU-Neural-BP. |
| bash inference.sh --code bb --model neural_bp --block_size 72 \ |
| --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005 |
| ``` |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| ### NTU-Transformer |
|
|
| The Transformer-based decoder combines standard multi-head self-attention |
| blocks with two QEC-specific components: |
|
|
| - **Scalable STEM embedding** β Encodes syndrome data from variable-size |
| lattices into a shared latent representation, absorbing the distance |
| dependence into the input encoding (Eq.β―2 in the paper). |
| - **QEC-aware 2D RoPE** β Applies rotary position embeddings to relative |
| algebraic displacements defined by the code's shift set *M*(*x*,β―*y*,β―*t*;β―*d*), |
| preserving detector geometry across code distances. |
| - **Interleaved RNN-Transformer backbone** β 5 GRU-based recurrent blocks |
| alternating with 6 spatial self-attention blocks. |
| - **Cross-attention logical readout** β Learnable logical query tokens attend |
| over the encoded detector representations. |
|
|
| ### NTU-Neural-BP |
|
|
| A graph-neural-network decoder operating on the bipartite Tanner graph: |
|
|
| - **Message passing** between variable and check nodes with gated recurrent |
| units (GRU) for message updates. |
| - **Syndrome-aware encoding** of check node states and prior LLRs. |
| - **Focal loss** with syndrome consistency regularization. |
| - Compact model (~300K parameters for the [[72,β―12,β―6]] code). |
|
|
| --- |
|
|
| ## Authors |
|
|
| [Ge Yan](https://grahamyan.github.io)<sup>1</sup>, |
| Shanchuan Li<sup>1,β―2</sup>, |
| Shiyi Xiao<sup>1,β―3</sup>, |
| Pengyue Ma<sup>1</sup>, |
| Hanyan Cao<sup>4</sup>, |
| [Feng Pan](https://scholar.google.com/citations?user=Vp6hFhUAAAAJ)<sup>4,\*</sup>, |
| [Yuxuan Du](https://yuxuan-du.github.io)<sup>1,\*</sup> |
|
|
| <sup>1</sup> College of Computing and Data Science, Nanyang Technological University, Singapore<br> |
| <sup>2</sup> Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Japan<br> |
| <sup>3</sup> School of Artificial Intelligence, Shanghai Jiao Tong University, China<br> |
| <sup>4</sup> Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Singapore |
|
|
| <small><sup>\*</sup> Corresponding authors</small> |
| |
| --- |
| |
| ## Citation |
| |
| ```bibtex |
| @article{ntu2026, |
| title={Efficient Foundation Decoders for Fault-Tolerant Quantum Computing}, |
| author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and |
| Cao, Hanyan and Pan, Feng and Du, Yuxuan}, |
| year={2026}, |
| } |
| ``` |
| |
| --- |
| |
| ## License |
| |
| This repository is released under the [MIT License](https://opensource.org/licenses/MIT). |
| |