Dreamworldsmile commited on
Commit
1fac425
Β·
verified Β·
1 Parent(s): 464eec5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +78 -94
README.md CHANGED
@@ -10,35 +10,41 @@ tags:
10
  - neural-decoder
11
  - bivariate-bicycle
12
  - ldpc
 
 
13
  pipeline_tag: other
14
  ---
15
 
16
  # NTU Neural Decoder Checkpoints
17
 
18
- Pre-trained neural decoder model weights for quantum error correction (QEC)
19
- codes, based on the **Neural Transfer Unification (NTU)** framework introduced
20
- in the accompanying paper.
21
 
22
- πŸ“„ **Paper**: *Transfer Learning is All You Need for Scalable Neural Decoder*
23
  🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/)
 
24
 
25
  ---
26
 
27
  ## Overview
28
 
29
- This repository hosts the official model checkpoints for two families of QEC
30
- codes:
 
 
31
 
32
- | Code family | Architecture | Decoder |
33
  |---|---|---|
34
- | Rotated surface code | AlphaQubit V2 (~58M parameters) | Transformer-based |
35
- | Bivariate-bicycle (BB) code | AlphaQubitV2_BB (~XXM parameters) | Transformer-based |
36
- | Bivariate-bicycle (BB) code | Neural Belief Propagation | GNN-based message passing |
37
 
38
- All models are implemented in PyTorch and trained with distributed data-parallel
39
- (DDP) across 8 GPUs. The surface code decoder uses progressive knowledge
40
- distillation from minimum-weight perfect matching (MWPM) pseudo-labels;
41
- the BB decoder is trained end-to-end on sampled syndromes.
 
 
42
 
43
  ---
44
 
@@ -47,59 +53,39 @@ the BB decoder is trained end-to-end on sampled syndromes.
47
  ```
48
  ntu-surface-code-decoder/
49
  β”œβ”€β”€ README.md
50
- β”œβ”€β”€ surface/ ← Surface code checkpoints (AlphaQubit V2)
51
- β”‚ β”œβ”€β”€ d7.pth (121 MB, trained from scratch)
52
- β”‚ β”œβ”€β”€ d11.pth (121 MB, transfer learning from d=7)
53
- β”‚ β”œβ”€β”€ d15.pth (121 MB, transfer learning from d=11)
54
- β”‚ β”œβ”€β”€ d19.pth (121 MB, transfer learning from d=15)
55
- β”‚ β”œβ”€β”€ d23.pth (121 MB, transfer learning from d=19)
56
- β”‚ └── d25.pth (122 MB, transfer learning from d=23)
57
- └── bb/ ← BB code checkpoints
58
- β”œβ”€β”€ bb72_transformer.pt (138 MB, AlphaQubitV2_BB, [[72,12,6]] code)
59
- └── neural_bp_bb72.pt (1.2 MB, Neural-BP, [[72,12,6]] code)
60
  ```
61
 
62
- ### Checkpoint format
63
-
64
- **Surface code checkpoints** (`surface/*.pth`):
65
- | Key | Type | Description |
66
- |---|---|---|
67
- | `model_state` | `OrderedDict` | Model weights (strip `_orig_mod.` and `module.` prefixes before loading) |
68
- | `d` | `int` | Code distance |
69
- | `rounds` | `int` | Syndrome extraction rounds |
70
- | `step` | `int` | Training step at which the checkpoint was saved |
71
-
72
- **BB Transformer checkpoints** (`bb/bb*_transformer.pt`):
73
- | Key | Type | Description |
74
- |---|---|---|
75
- | `model_state` | `OrderedDict` | Model weights |
76
- | `step` | `int` | Training step |
77
- | `block_acc` | `float` | Block accuracy at save time |
78
- | `per_log_mean` | `float` | Per-logical average accuracy |
79
- | `output_convention` | `dict` | Logical observable convention metadata |
80
-
81
- **Neural-BP checkpoints** (`bb/neural_bp_*.pt`):
82
- | Key | Type | Description |
83
- |---|---|---|
84
- | (raw `state_dict`) | `OrderedDict` | Model weights (strip `module.` prefix before loading) |
85
 
86
  ---
87
 
88
  ## Usage
89
 
90
- ### Surface code β€” AlphaQubit V2
91
 
92
  ```python
93
  import torch
94
  from huggingface_hub import hf_hub_download
95
 
96
- # Download a surface code checkpoint.
97
  ckpt_path = hf_hub_download(
98
  repo_id="Dreamworldsmile/ntu-surface-code-decoder",
99
  filename="surface/d7.pth",
100
  )
101
 
102
- # Load into an AlphaQubit V2 model instance.
103
  ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
104
  model.load_state_dict(
105
  {k.replace("_orig_mod.", "").replace("module.", ""): v
@@ -108,22 +94,18 @@ model.load_state_dict(
108
  )
109
  ```
110
 
111
- ### BB code β€” AlphaQubitV2_BB (Transformer)
112
 
113
  ```python
114
- import torch
115
- from huggingface_hub import hf_hub_download
116
-
117
  ckpt_path = hf_hub_download(
118
  repo_id="Dreamworldsmile/ntu-surface-code-decoder",
119
  filename="bb/bb72_transformer.pt",
120
  )
121
 
122
  ckpt = torch.load(ckpt_path, map_location="cpu")
123
- state_dict = ckpt["model_state"]
124
  state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v
125
- for k, v in state_dict.items()}
126
- # Filter to keys present in the model (skip logical_readout_bias).
127
  model_sd = model.state_dict()
128
  filtered = {k: v for k, v in state_dict.items()
129
  if k in model_sd and model_sd[k].shape == v.shape
@@ -131,7 +113,7 @@ filtered = {k: v for k, v in state_dict.items()
131
  model.load_state_dict(filtered, strict=False)
132
  ```
133
 
134
- ### BB code β€” Neural Belief Propagation
135
 
136
  ```python
137
  ckpt_path = hf_hub_download(
@@ -146,63 +128,69 @@ model.load_state_dict(state_dict, strict=True)
146
 
147
  ### Inference with the official code
148
 
149
- The [official implementation](https://github.com/GrahamYan/ntu-decoder) provides a
150
- unified inference launcher that automatically downloads the required checkpoint:
151
-
152
  ```bash
153
- # Surface code inference.
 
 
 
154
  bash inference.sh --code surface --d 7 \
155
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000
156
 
157
- # BB Transformer inference.
158
  bash inference.sh --code bb --model transformer --block_size 72 \
159
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
160
 
161
- # BB Neural-BP inference.
162
  bash inference.sh --code bb --model neural_bp --block_size 72 \
163
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
164
  ```
165
 
166
- For training and baseline evaluations, please refer to the shell scripts under
167
- `codes/Surface/` and `codes/BB/` in the source repository.
168
-
169
  ---
170
 
171
  ## Model Architecture
172
 
173
- ### AlphaQubit V2 / AlphaQubitV2_BB
174
 
175
- A high-capacity neural decoder featuring:
 
176
 
177
- - **Interleaved RNN-Transformer backbone** (5 GRU + 6 self-attention layers)
178
- - **2D Rotary Position Embedding (RoPE)** based on physical detector coordinates
179
- - **Joint X+Z stabilizer processing** with spatial hint connections between
180
- same-type and cross-type stabilizers
181
- - **Cross-attention readout** with learnable logical query tokens
182
- - Trained with **progressive knowledge distillation** from MWPM pseudo-labels
183
- (surface code) or end-to-end on sampled syndromes (BB code)
 
 
 
184
 
185
- ### Neural Belief Propagation
186
 
187
- A graph-neural-network decoder operating on the Tanner graph of the code:
188
 
189
- - **Bipartite message passing** between variable and check nodes
190
- - **Gated recurrent units (GRU)** for message updates
191
- - **Focal loss** with syndrome consistency regularization
192
- - Compact model size (~300K parameters for BB72)
 
193
 
194
  ---
195
 
196
  ## Authors
197
 
198
- Ge Yan<sup>1</sup>, Shanchuan Li<sup>1,2</sup>, **Shiyi Xiao**<sup>1,3</sup>,
199
- Pengyue Ma<sup>1</sup>, Hanyan Cao<sup>4</sup>, Feng Pan<sup>4,\*</sup>,
200
- Yuxuan Du<sup>1,\*</sup>
 
 
 
 
201
 
202
- <sup>1</sup> Nanyang Technological University &nbsp;
203
- <sup>2</sup> Tokyo University of Agriculture and Technology &nbsp;
204
- <sup>3</sup> Shanghai Jiao Tong University &nbsp;
205
- <sup>4</sup> Singapore University of Technology and Design
206
 
207
  <small><sup>\*</sup> Corresponding authors</small>
208
 
@@ -210,15 +198,11 @@ Yuxuan Du<sup>1,\*</sup>
210
 
211
  ## Citation
212
 
213
- If you use these model weights or the NTU framework in your research, please
214
- cite the accompanying paper:
215
-
216
  ```bibtex
217
  @article{ntu2026,
218
- title={Transfer Learning is All You Need for Scalable Neural Decoder},
219
  author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and
220
  Cao, Hanyan and Pan, Feng and Du, Yuxuan},
221
- journal={arXiv preprint},
222
  year={2026},
223
  }
224
  ```
 
10
  - neural-decoder
11
  - bivariate-bicycle
12
  - ldpc
13
+ - transfer-learning
14
+ - foundation-decoder
15
  pipeline_tag: other
16
  ---
17
 
18
  # NTU Neural Decoder Checkpoints
19
 
20
+ Pre-trained model weights for the neural decoders introduced in **Neural
21
+ Transfer Unification (NTU)**, an architecture-agnostic transfer-learning
22
+ framework for scalable quantum error correction.
23
 
24
+ πŸ“„ **Paper**: *Efficient Foundation Decoders for Fault-Tolerant Quantum Computing*
25
  🌐 **Project page**: [https://grahamyan.github.io/ntu-decoder/](https://grahamyan.github.io/ntu-decoder/)
26
+ πŸ’» **Source code**: [https://github.com/GrahamYan/ntu-decoder](https://github.com/GrahamYan/ntu-decoder)
27
 
28
  ---
29
 
30
  ## Overview
31
 
32
+ NTU exploits the algebraic scale invariance of structured QEC code families to
33
+ transfer error knowledge from small codes to large-scale fault-tolerant regimes,
34
+ eliminating the cold-start optimization barrier. The framework is instantiated
35
+ with two backbone architectures:
36
 
37
+ | Backbone | Description | Code families |
38
  |---|---|---|
39
+ | **NTU-Transformer** | Interleaved RNN-Transformer with 2D RoPE and cross-attention readout | Surface, BB |
40
+ | **NTU-Neural-BP** | Graph-neural belief propagation on the code Tanner graph | BB |
 
41
 
42
+ For planar surface codes under circuit-level depolarizing noise, NTU-Transformer
43
+ surpasses standard PyMatching at *d* = 25 within a ~10Β³β€―GPU-hour training budget.
44
+ For the [[72,β€―12,β€―6]] bivariate-bicycle (BB) code, it outperforms BP+OSD across
45
+ all tested physical error rates and is competitive with multi-stage Relay BP.
46
+ Transfer from [[72,β€―12,β€―6]] to [[144,β€―12,β€―12]] reaches 93.1% block accuracy
47
+ within 2,500 steps (NTU-Transformer) and 95.3% within 500 steps (NTU-Neural-BP).
48
 
49
  ---
50
 
 
53
  ```
54
  ntu-surface-code-decoder/
55
  β”œβ”€β”€ README.md
56
+ β”œβ”€β”€ surface/ ← Surface code (NTU-Transformer)
57
+ β”‚ β”œβ”€β”€ d7.pth (121 MB, trained from scratch)
58
+ β”‚ β”œβ”€β”€ d11.pth (121 MB, transferred from d=7)
59
+ β”‚ β”œβ”€β”€ d15.pth (121 MB, transferred from d=11)
60
+ β”‚ β”œβ”€β”€ d19.pth (121 MB, transferred from d=15)
61
+ β”‚ β”œβ”€β”€ d23.pth (121 MB, transferred from d=19)
62
+ β”‚ └── d25.pth (122 MB, transferred from d=23)
63
+ └── bb/ ← BB code checkpoints
64
+ β”œβ”€β”€ bb72_transformer.pt (138 MB, NTU-Transformer, [[72,12,6]])
65
+ └── neural_bp_bb72.pt (1.2 MB, NTU-Neural-BP, [[72,12,6]])
66
  ```
67
 
68
+ Each surface code checkpoint contains `model_state` (OrderedDict of weights),
69
+ `d` (code distance), `rounds` (syndrome extraction rounds), and `step`
70
+ (training step). BB Transformer checkpoints additionally include `block_acc`
71
+ and `output_convention` metadata. NTU-Neural-BP checkpoints store the raw
72
+ `state_dict` directly.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ---
75
 
76
  ## Usage
77
 
78
+ ### NTU-Transformer β€” Surface code
79
 
80
  ```python
81
  import torch
82
  from huggingface_hub import hf_hub_download
83
 
 
84
  ckpt_path = hf_hub_download(
85
  repo_id="Dreamworldsmile/ntu-surface-code-decoder",
86
  filename="surface/d7.pth",
87
  )
88
 
 
89
  ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
90
  model.load_state_dict(
91
  {k.replace("_orig_mod.", "").replace("module.", ""): v
 
94
  )
95
  ```
96
 
97
+ ### NTU-Transformer β€” BB code
98
 
99
  ```python
 
 
 
100
  ckpt_path = hf_hub_download(
101
  repo_id="Dreamworldsmile/ntu-surface-code-decoder",
102
  filename="bb/bb72_transformer.pt",
103
  )
104
 
105
  ckpt = torch.load(ckpt_path, map_location="cpu")
 
106
  state_dict = {k.replace("_orig_mod.", "").replace("module.", ""): v
107
+ for k, v in ckpt["model_state"].items()}
108
+ # Filter to keys present in the target model (skip logical_readout_bias).
109
  model_sd = model.state_dict()
110
  filtered = {k: v for k, v in state_dict.items()
111
  if k in model_sd and model_sd[k].shape == v.shape
 
113
  model.load_state_dict(filtered, strict=False)
114
  ```
115
 
116
+ ### NTU-Neural-BP β€” BB code
117
 
118
  ```python
119
  ckpt_path = hf_hub_download(
 
128
 
129
  ### Inference with the official code
130
 
 
 
 
131
  ```bash
132
+ git clone https://github.com/GrahamYan/ntu-decoder.git
133
+ cd ntu-decoder
134
+
135
+ # Surface code.
136
  bash inference.sh --code surface --d 7 \
137
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000
138
 
139
+ # BB code β€” NTU-Transformer.
140
  bash inference.sh --code bb --model transformer --block_size 72 \
141
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
142
 
143
+ # BB code β€” NTU-Neural-BP.
144
  bash inference.sh --code bb --model neural_bp --block_size 72 \
145
  --hf_repo Dreamworldsmile/ntu-surface-code-decoder --shots 100000 --p 0.005
146
  ```
147
 
 
 
 
148
  ---
149
 
150
  ## Model Architecture
151
 
152
+ ### NTU-Transformer
153
 
154
+ The Transformer-based decoder combines standard multi-head self-attention
155
+ blocks with two QEC-specific components:
156
 
157
+ - **Scalable STEM embedding** β€” Encodes syndrome data from variable-size
158
+ lattices into a shared latent representation, absorbing the distance
159
+ dependence into the input encoding (Eq.β€―2 in the paper).
160
+ - **QEC-aware 2D RoPE** β€” Applies rotary position embeddings to relative
161
+ algebraic displacements defined by the code's shift set *M*(*x*,β€―*y*,β€―*t*;β€―*d*),
162
+ preserving detector geometry across code distances.
163
+ - **Interleaved RNN-Transformer backbone** β€” 5 GRU-based recurrent blocks
164
+ alternating with 6 spatial self-attention blocks.
165
+ - **Cross-attention logical readout** β€” Learnable logical query tokens attend
166
+ over the encoded detector representations.
167
 
168
+ ### NTU-Neural-BP
169
 
170
+ A graph-neural-network decoder operating on the bipartite Tanner graph:
171
 
172
+ - **Message passing** between variable and check nodes with gated recurrent
173
+ units (GRU) for message updates.
174
+ - **Syndrome-aware encoding** of check node states and prior LLRs.
175
+ - **Focal loss** with syndrome consistency regularization.
176
+ - Compact model (~300K parameters for the [[72,β€―12,β€―6]] code).
177
 
178
  ---
179
 
180
  ## Authors
181
 
182
+ [Ge Yan](https://grahamyan.github.io)<sup>1</sup>,
183
+ Shanchuan Li<sup>1,β€―2</sup>,
184
+ Shiyi Xiao<sup>1,β€―3</sup>,
185
+ Pengyue Ma<sup>1</sup>,
186
+ Hanyan Cao<sup>4</sup>,
187
+ [Feng Pan](https://scholar.google.com/citations?user=Vp6hFhUAAAAJ)<sup>4,\*</sup>,
188
+ [Yuxuan Du](https://yuxuan-du.github.io)<sup>1,\*</sup>
189
 
190
+ <sup>1</sup> College of Computing and Data Science, Nanyang Technological University, Singapore<br>
191
+ <sup>2</sup> Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Japan<br>
192
+ <sup>3</sup> School of Artificial Intelligence, Shanghai Jiao Tong University, China<br>
193
+ <sup>4</sup> Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Singapore
194
 
195
  <small><sup>\*</sup> Corresponding authors</small>
196
 
 
198
 
199
  ## Citation
200
 
 
 
 
201
  ```bibtex
202
  @article{ntu2026,
203
+ title={Efficient Foundation Decoders for Fault-Tolerant Quantum Computing},
204
  author={Yan, Ge and Li, Shanchuan and Xiao, Shiyi and Ma, Pengyue and
205
  Cao, Hanyan and Pan, Feng and Du, Yuxuan},
 
206
  year={2026},
207
  }
208
  ```