Dreamworldsmile
/

ntu-surface-code-decoder

@@ -10,35 +10,41 @@ tags:
 - neural-decoder
 - bivariate-bicycle
 - ldpc
 pipeline_tag: other
 ---
 # NTU Neural Decoder Checkpoints
-Pre-trained neural decoder model weights for quantum error correction (QEC)
-codes, based on the **Neural Transfer Unification (NTU)** framework introduced
-in the accompanying paper.
-📄 **Paper**: *Transfer Learning is All You Need for Scalable Neural Decoder*
 🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/)
 ---
 ## Overview
-This repository hosts the official model checkpoints for two families of QEC
-codes:
-| Code family | Architecture | Decoder |
 |---|---|---|
-| Rotated surface code | AlphaQubit V2 (~58M parameters) | Transformer-based |
-| Bivariate-bicycle (BB) code | AlphaQubitV2_BB (~XXM parameters) | Transformer-based |
-| Bivariate-bicycle (BB) code | Neural Belief Propagation | GNN-based message passing |
-All models are implemented in PyTorch and trained with distributed data-parallel
-(DDP) across 8 GPUs. The surface code decoder uses progressive knowledge
-distillation from minimum-weight perfect matching (MWPM) pseudo-labels;
-the BB decoder is trained end-to-end on sampled syndromes.
 ---
@@ -47,59 +53,39 @@ the BB decoder is trained end-to-end on sampled syndromes.
 ```
 ntu-surface-code-decoder/
 ├── README.md
-├── surface/                      ← Surface code checkpoints (AlphaQubit V2)
-│   ├── d7.pth                    (121 MB, trained from scratch)
-│   ├── d11.pth                   (121 MB, transfer learning from d=7)
-│   ├── d15.pth                   (121 MB, transfer learning from d=11)
-│   ├── d19.pth                   (121 MB, transfer learning from d=15)
-│   ├── d23.pth                   (121 MB, transfer learning from d=19)
-│   └── d25.pth                   (122 MB, transfer learning from d=23)
-└── bb/                           ← BB code checkpoints
-    ├── bb72_transformer.pt       (138 MB, AlphaQubitV2_BB, [[72,12,6]] code)
-    └── neural_bp_bb72.pt         (1.2 MB, Neural-BP, [[72,12,6]] code)
 ```
-### Checkpoint format
-**Surface code checkpoints** (`surface/*.pth`):
-| Key | Type | Description |
-|---|---|---|
-| `model_state` | `OrderedDict` | Model weights (strip `_orig_mod.` and `module.` prefixes before loading) |
-| `d` | `int` | Code distance |
-| `rounds` | `int` | Syndrome extraction rounds |
-| `step` | `int` | Training step at which the checkpoint was saved |
-**BB Transformer checkpoints** (`bb/bb*_transformer.pt`):
-| Key | Type | Description |
-|---|---|---|
-| `model_state` | `OrderedDict` | Model weights |
-| `step` | `int` | Training step |
-| `block_acc` | `float` | Block accuracy at save time |
-| `per_log_mean` | `float` | Per-logical average accuracy |
-| `output_convention` | `dict` | Logical observable convention metadata |
-**Neural-BP checkpoints** (`bb/neural_bp_*.pt`):
-| Key | Type | Description |
-|---|---|---|
-| (raw `state_dict`) | `OrderedDict` | Model weights (strip `module.` prefix before loading) |
 ---
 ## Usage
-### Surface code — AlphaQubit V2
 ```python
 import torch
 from huggingface_hub import hf_hub_download
-# Download a surface code checkpoint.
 ckpt_path = hf_hub_download(
     repo_id="Dreamworldsmile/ntu-surface-code-decoder",
     filename="surface/d7.pth",
 )
-# Load into an AlphaQubit V2 model instance.
 ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
 model.load_state_dict(
     {k.replace("_orig_mod.", "").replace("module.", ""): v
@@ -108,22 +94,18 @@ model.load_state_dict(
 )
 ```
-### BB code — AlphaQubitV2_BB (Transformer)
 ```python
-import torch
-from huggingface_hub import hf_hub_download
 ckpt_path = hf_hub_download(
     repo_id="Dreamworldsmile/ntu-surface-code-decoder",
     filename="bb/bb72_transformer.pt",
 )
 ckpt = torch.load(ckpt_path, map_location="cpu")
-state_dict = ckpt["model_state"]
 state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v
-              for k, v in state_dict.items()}
-# Filter to keys present in the model (skip logical_readout_bias).
 model_sd = model.state_dict()
 filtered = {k: v for k, v in state_dict.items()
             if k in model_sd and model_sd[k].shape == v.shape
@@ -131,7 +113,7 @@ filtered = {k: v for k, v in state_dict.items()
 model.load_state_dict(filtered, strict=False)
 ```
-### BB code — Neural Belief Propagation
 ```python
 ckpt_path = hf_hub_download(
@@ -146,63 +128,69 @@ model.load_state_dict(state_dict, strict=True)
 ### Inference with the official code
-The [official implementation](https://github.com/GrahamYan/ntu-decoder) provides a
-unified inference launcher that automatically downloads the required checkpoint:
 ```bash
-# Surface code inference.
 bash inference.sh --code surface --d 7 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000
-# BB Transformer inference.
 bash inference.sh --code bb --model transformer --block_size 72 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
-# BB Neural-BP inference.
 bash inference.sh --code bb --model neural_bp --block_size 72 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
 ```
-For training and baseline evaluations, please refer to the shell scripts under
-`codes/Surface/` and `codes/BB/` in the source repository.
 ---
 ## Model Architecture
-### AlphaQubit V2 / AlphaQubitV2_BB
-A high-capacity neural decoder featuring:
-- **Interleaved RNN-Transformer backbone** (5 GRU + 6 self-attention layers)
-- **2D Rotary Position Embedding (RoPE)** based on physical detector coordinates
-- **Joint X+Z stabilizer processing** with spatial hint connections between
-  same-type and cross-type stabilizers
-- **Cross-attention readout** with learnable logical query tokens
-- Trained with **progressive knowledge distillation** from MWPM pseudo-labels
-  (surface code) or end-to-end on sampled syndromes (BB code)
-### Neural Belief Propagation
-A graph-neural-network decoder operating on the Tanner graph of the code:
-- **Bipartite message passing** between variable and check nodes
-- **Gated recurrent units (GRU)** for message updates
-- **Focal loss** with syndrome consistency regularization
-- Compact model size (~300K parameters for BB72)
 ---
 ## Authors
-Ge Yan<sup>1</sup>, Shanchuan Li<sup>1,2</sup>, **Shiyi Xiao**<sup>1,3</sup>,
-Pengyue Ma<sup>1</sup>, Hanyan Cao<sup>4</sup>, Feng Pan<sup>4,\*</sup>,
-Yuxuan Du<sup>1,\*</sup>
-<sup>1</sup> Nanyang Technological University &nbsp;
-<sup>2</sup> Tokyo University of Agriculture and Technology &nbsp;
-<sup>3</sup> Shanghai Jiao Tong University &nbsp;
-<sup>4</sup> Singapore University of Technology and Design
 <small><sup>\*</sup> Corresponding authors</small>
@@ -210,15 +198,11 @@ Yuxuan Du<sup>1,\*</sup>
 ## Citation
-If you use these model weights or the NTU framework in your research, please
-cite the accompanying paper:
 ```bibtex
 @article{ntu2026,
-  title={Transfer Learning is All You Need for Scalable Neural Decoder},
   author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and
           Cao, Hanyan and Pan, Feng and Du, Yuxuan},
-  journal={arXiv preprint},
   year={2026},
 }
 ```

 - neural-decoder
 - bivariate-bicycle
 - ldpc
+- transfer-learning
+- foundation-decoder
 pipeline_tag: other
 ---
 # NTU Neural Decoder Checkpoints
+Pre-trained model weights for the neural decoders introduced in **Neural
+Transfer Unification (NTU)**, an architecture-agnostic transfer-learning
+framework for scalable quantum error correction.
+📄 **Paper**: *Efficient Foundation Decoders for Fault-Tolerant Quantum Computing*
 🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/)
+💻 **Source code**: [https://github.com/GrahamYan/ntu-decoder](https://github.com/GrahamYan/ntu-decoder)
 ---
 ## Overview
+NTU exploits the algebraic scale invariance of structured QEC code families to
+transfer error knowledge from small codes to large-scale fault-tolerant regimes,
+eliminating the cold-start optimization barrier.  The framework is instantiated
+with two backbone architectures:
+| Backbone | Description | Code families |
 |---|---|---|
+| **NTU-Transformer** | Interleaved RNN-Transformer with 2D RoPE and cross-attention readout | Surface, BB |
+| **NTU-Neural-BP** | Graph-neural belief propagation on the code Tanner graph | BB |
+For planar surface codes under circuit-level depolarizing noise, NTU-Transformer
+surpasses standard PyMatching at *d* = 25 within a ~10³ GPU-hour training budget.
+For the [[72, 12, 6]] bivariate-bicycle (BB) code, it outperforms BP+OSD across
+all tested physical error rates and is competitive with multi-stage Relay BP.
+Transfer from [[72, 12, 6]] to [[144, 12, 12]] reaches 93.1% block accuracy
+within 2,500 steps (NTU-Transformer) and 95.3% within 500 steps (NTU-Neural-BP).
 ---
 ```
 ntu-surface-code-decoder/
 ├── README.md
+├── surface/                         ← Surface code (NTU-Transformer)
+│   ├── d7.pth                       (121 MB, trained from scratch)
+│   ├── d11.pth                      (121 MB, transferred from d=7)
+│   ├── d15.pth                      (121 MB, transferred from d=11)
+│   ├── d19.pth                      (121 MB, transferred from d=15)
+│   ├── d23.pth                      (121 MB, transferred from d=19)
+│   └── d25.pth                      (122 MB, transferred from d=23)
+└── bb/                              ← BB code checkpoints
+    ├── bb72_transformer.pt          (138 MB, NTU-Transformer, [[72,12,6]])
+    └── neural_bp_bb72.pt            (1.2 MB, NTU-Neural-BP,  [[72,12,6]])
 ```
+Each surface code checkpoint contains `model_state` (OrderedDict of weights),
+`d` (code distance), `rounds` (syndrome extraction rounds), and `step`
+(training step).  BB Transformer checkpoints additionally include `block_acc`
+and `output_convention` metadata.  NTU-Neural-BP checkpoints store the raw
+`state_dict` directly.
 ---
 ## Usage
+### NTU-Transformer — Surface code
 ```python
 import torch
 from huggingface_hub import hf_hub_download
 ckpt_path = hf_hub_download(
     repo_id="Dreamworldsmile/ntu-surface-code-decoder",
     filename="surface/d7.pth",
 )
 ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
 model.load_state_dict(
     {k.replace("_orig_mod.", "").replace("module.", ""): v
 )
 ```
+### NTU-Transformer — BB code
 ```python
 ckpt_path = hf_hub_download(
     repo_id="Dreamworldsmile/ntu-surface-code-decoder",
     filename="bb/bb72_transformer.pt",
 )
 ckpt = torch.load(ckpt_path, map_location="cpu")
 state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v
+              for k, v in ckpt["model_state"].items()}
+# Filter to keys present in the target model (skip logical_readout_bias).
 model_sd = model.state_dict()
 filtered = {k: v for k, v in state_dict.items()
             if k in model_sd and model_sd[k].shape == v.shape
 model.load_state_dict(filtered, strict=False)
 ```
+### NTU-Neural-BP — BB code
 ```python
 ckpt_path = hf_hub_download(
 ### Inference with the official code
 ```bash
+git clone https://github.com/GrahamYan/ntu-decoder.git
+cd ntu-decoder
+# Surface code.
 bash inference.sh --code surface --d 7 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000
+# BB code — NTU-Transformer.
 bash inference.sh --code bb --model transformer --block_size 72 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
+# BB code — NTU-Neural-BP.
 bash inference.sh --code bb --model neural_bp --block_size 72 \
     --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
 ```
 ---
 ## Model Architecture
+### NTU-Transformer
+The Transformer-based decoder combines standard multi-head self-attention
+blocks with two QEC-specific components:
+- **Scalable STEM embedding** — Encodes syndrome data from variable-size
+  lattices into a shared latent representation, absorbing the distance
+  dependence into the input encoding (Eq. 2 in the paper).
+- **QEC-aware 2D RoPE** — Applies rotary position embeddings to relative
+  algebraic displacements defined by the code's shift set *M*(*x*, *y*, *t*; *d*),
+  preserving detector geometry across code distances.
+- **Interleaved RNN-Transformer backbone** — 5 GRU-based recurrent blocks
+  alternating with 6 spatial self-attention blocks.
+- **Cross-attention logical readout** — Learnable logical query tokens attend
+  over the encoded detector representations.
+### NTU-Neural-BP
+A graph-neural-network decoder operating on the bipartite Tanner graph:
+- **Message passing** between variable and check nodes with gated recurrent
+  units (GRU) for message updates.
+- **Syndrome-aware encoding** of check node states and prior LLRs.
+- **Focal loss** with syndrome consistency regularization.
+- Compact model (~300K parameters for the [[72, 12, 6]] code).
 ---
 ## Authors
+[Ge Yan](https://grahamyan.github.io)<sup>1</sup>,
+Shanchuan Li<sup>1, 2</sup>,
+Shiyi Xiao<sup>1, 3</sup>,
+Pengyue Ma<sup>1</sup>,
+Hanyan Cao<sup>4</sup>,
+[Feng Pan](https://scholar.google.com/citations?user=Vp6hFhUAAAAJ)<sup>4,\*</sup>,
+[Yuxuan Du](https://yuxuan-du.github.io)<sup>1,\*</sup>
+<sup>1</sup> College of Computing and Data Science, Nanyang Technological University, Singapore<br>
+<sup>2</sup> Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Japan<br>
+<sup>3</sup> School of Artificial Intelligence, Shanghai Jiao Tong University, China<br>
+<sup>4</sup> Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Singapore
 <small><sup>\*</sup> Corresponding authors</small>
 ## Citation
 ```bibtex
 @article{ntu2026,
+  title={Efficient Foundation Decoders for Fault-Tolerant Quantum Computing},
   author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and
           Cao, Hanyan and Pan, Feng and Du, Yuxuan},
   year={2026},
 }
 ```