---
language: en
license: mit
tags:
- qec
- surface-code
- quantum
- pytorch
- quantum-error-correction
- neural-decoder
- bivariate-bicycle
- ldpc
- transfer-learning
- foundation-decoder
pipeline_tag: other
---

# NTU Neural Decoder Checkpoints

Pre-trained model weights for the neural decoders introduced in **Neural
Transfer Unification (NTU)**, an architecture-agnostic transfer-learning
framework for scalable quantum error correction.

📄 **Paper**: *Efficient Foundation Decoders for Fault-Tolerant Quantum Computing*

🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/)

💻 **Source code**: [https://github.com/GrahamYan/ntu-decoder](https://github.com/GrahamYan/ntu-decoder)

---

## Overview

NTU exploits the algebraic scale invariance of structured QEC code families to
transfer error knowledge from small codes to large-scale fault-tolerant regimes,
eliminating the cold-start optimization barrier.  The framework is instantiated
with two backbone architectures:

| Backbone | Description | Code families |
|---|---|---|
| **NTU-Transformer** | Interleaved RNN-Transformer with 2D RoPE and cross-attention readout | Surface, BB |
| **NTU-Neural-BP** | Graph-neural belief propagation on the code Tanner graph | BB |

For planar surface codes under circuit-level depolarizing noise, NTU-Transformer
surpasses standard PyMatching at *d* = 25 within a ~10³ GPU-hour training budget.
For the [[72, 12, 6]] bivariate-bicycle (BB) code, it outperforms BP+OSD across
all tested physical error rates and is competitive with multi-stage Relay BP.
Transfer from [[72, 12, 6]] to [[144, 12, 12]] reaches 93.1% block accuracy
within 2,500 steps (NTU-Transformer) and 95.3% within 500 steps (NTU-Neural-BP).

---

## Repository Structure

```
ntu-surface-code-decoder/
├── README.md
├── surface/                         ← Surface code (NTU-Transformer)
│   ├── d7.pth                       (121 MB, trained from scratch)
│   ├── d11.pth                      (121 MB, transferred from d=7)
│   ├── d15.pth                      (121 MB, transferred from d=11)
│   ├── d19.pth                      (121 MB, transferred from d=15)
│   ├── d23.pth                      (121 MB, transferred from d=19)
│   └── d25.pth                      (122 MB, transferred from d=23)
└── bb/                              ← BB code checkpoints
    ├── bb72_transformer.pt          (138 MB, NTU-Transformer, [[72,12,6]])
    └── neural_bp_bb72.pt            (1.2 MB, NTU-Neural-BP,  [[72,12,6]])
```

Each surface code checkpoint contains `model_state` (OrderedDict of weights),
`d` (code distance), `rounds` (syndrome extraction rounds), and `step`
(training step).  BB Transformer checkpoints additionally include `block_acc`
and `output_convention` metadata.  NTU-Neural-BP checkpoints store the raw
`state_dict` directly.

---

## Usage

### NTU-Transformer — Surface code

```python
import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Dreamworldsmile/ntu-surface-code-decoder",
    filename="surface/d7.pth",
)

ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model.load_state_dict(
    {k.replace("_orig_mod.", "").replace("module.", ""): v
     for k, v in ckpt["model_state"].items()},
    strict=False,
)
```

### NTU-Transformer — BB code

```python
ckpt_path = hf_hub_download(
    repo_id="Dreamworldsmile/ntu-surface-code-decoder",
    filename="bb/bb72_transformer.pt",
)

ckpt = torch.load(ckpt_path, map_location="cpu")
state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v
              for k, v in ckpt["model_state"].items()}
# Filter to keys present in the target model (skip logical_readout_bias).
model_sd = model.state_dict()
filtered = {k: v for k, v in state_dict.items()
            if k in model_sd and model_sd[k].shape == v.shape
            and k != "logical_readout_bias"}
model.load_state_dict(filtered, strict=False)
```

### NTU-Neural-BP — BB code

```python
ckpt_path = hf_hub_download(
    repo_id="Dreamworldsmile/ntu-surface-code-decoder",
    filename="bb/neural_bp_bb72.pt",
)

ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=True)
state_dict = {k.replace("module.", ""): v for k, v in ckpt.items()}
model.load_state_dict(state_dict, strict=True)
```

### Inference with the official code

```bash
git clone https://github.com/GrahamYan/ntu-decoder.git
cd ntu-decoder

# Surface code.
bash inference.sh --code surface --d 7 \
    --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000

# BB code — NTU-Transformer.
bash inference.sh --code bb --model transformer --block_size 72 \
    --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005

# BB code — NTU-Neural-BP.
bash inference.sh --code bb --model neural_bp --block_size 72 \
    --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
```

---

## Model Architecture

### NTU-Transformer

The Transformer-based decoder combines standard multi-head self-attention
blocks with two QEC-specific components:

- **Scalable STEM embedding** — Encodes syndrome data from variable-size
  lattices into a shared latent representation, absorbing the distance
  dependence into the input encoding (Eq. 2 in the paper).
- **QEC-aware 2D RoPE** — Applies rotary position embeddings to relative
  algebraic displacements defined by the code's shift set *M*(*x*, *y*, *t*; *d*),
  preserving detector geometry across code distances.
- **Interleaved RNN-Transformer backbone** — 5 GRU-based recurrent blocks
  alternating with 6 spatial self-attention blocks.
- **Cross-attention logical readout** — Learnable logical query tokens attend
  over the encoded detector representations.

### NTU-Neural-BP

A graph-neural-network decoder operating on the bipartite Tanner graph:

- **Message passing** between variable and check nodes with gated recurrent
  units (GRU) for message updates.
- **Syndrome-aware encoding** of check node states and prior LLRs.
- **Focal loss** with syndrome consistency regularization.
- Compact model (~300K parameters for the [[72, 12, 6]] code).

---

## Authors

[Ge Yan](https://grahamyan.github.io)<sup>1</sup>,
Shanchuan Li<sup>1, 2</sup>,
Shiyi Xiao<sup>1, 3</sup>,
Pengyue Ma<sup>1</sup>,
Hanyan Cao<sup>4</sup>,
[Feng Pan](https://scholar.google.com/citations?user=Vp6hFhUAAAAJ)<sup>4,\*</sup>,
[Yuxuan Du](https://yuxuan-du.github.io)<sup>1,\*</sup>

<sup>1</sup> College of Computing and Data Science, Nanyang Technological University, Singapore<br>
<sup>2</sup> Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Japan<br>
<sup>3</sup> School of Artificial Intelligence, Shanghai Jiao Tong University, China<br>
<sup>4</sup> Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Singapore

<small><sup>\*</sup> Corresponding authors</small>

---

## Citation

```bibtex
@article{ntu2026,
  title={Efficient Foundation Decoders for Fault-Tolerant Quantum Computing},
  author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and
          Cao, Hanyan and Pan, Feng and Du, Yuxuan},
  year={2026},
}
```

---

## License

This repository is released under the [MIT License](https://opensource.org/licenses/MIT).