Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -9,6 +9,8 @@ tags:
|
|
| 9 |
- semistandard-tableaux
|
| 10 |
- reverse-plane-partitions
|
| 11 |
- hillman-grassl
|
|
|
|
|
|
|
| 12 |
- transformer
|
| 13 |
- pytorch
|
| 14 |
datasets:
|
|
@@ -19,12 +21,13 @@ pipeline_tag: other
|
|
| 19 |
|
| 20 |
# RSK Transformer
|
| 21 |
|
| 22 |
-
A transformer that learns **inverse combinatorial bijections** β the Robinson-Schensted-Knuth correspondence (permutations and matrices)
|
| 23 |
|
| 24 |
-
Achieves **100% exact-match accuracy** on held-out test data for permutations at n=10, **99.99%** at n=15 (1.3 trillion permutations), **100%** on 3Γ3 matrix RSK,
|
| 25 |
|
| 26 |
π **Paper**: [paper.pdf](paper.pdf)
|
| 27 |
π» **Code**: [github.com/RaggedR/rsk-transformer](https://github.com/RaggedR/rsk-transformer)
|
|
|
|
| 28 |
|
| 29 |
## Results
|
| 30 |
|
|
@@ -75,6 +78,19 @@ Given a reverse plane partition (RPP) of shape Ξ», recover the arbitrary filling
|
|
| 75 |
|
| 76 |
The Hillman-Grassl bijection is fundamentally different from RSK β it involves zigzag paths through the Young diagram rather than Schensted insertion β yet the same transformer architecture learns it to near-perfect accuracy. Tall shapes converge slower (36 epochs vs 17-23) because longer zigzag paths create longer-range dependencies.
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
## Key Idea: Structured 2D Token Embeddings
|
| 79 |
|
| 80 |
Previous work encoded tableaux as flat bracket strings, destroying 2D geometry. We encode each tableau entry as a token with four learned embeddings:
|
|
@@ -124,6 +140,14 @@ Input: (P, Q) as 2n structured tokens
|
|
| 124 |
| `checkpoints/encoder_rpp_6x4x2_m4/best.pt` | RSKEncoder on RPP shape (6,4,2), max_entry=4 | ~1.2M |
|
| 125 |
| `checkpoints/encoder_rpp_2x2x2x2x2x1_m4/best.pt` | RSKEncoder on RPP shape (2,2,2,2,2,1), max_entry=4 | ~1.2M |
|
| 126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
### Loading a checkpoint
|
| 128 |
|
| 129 |
```python
|
|
@@ -161,6 +185,12 @@ python train.py --model encoder --task rpp --shape 4,3,2,1 --max-entry 4 \
|
|
| 161 |
--source sample --train-size 500000
|
| 162 |
python train.py --model encoder --task rpp --shape 6,4,2 --max-entry 4 \
|
| 163 |
--source sample --train-size 500000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
```
|
| 165 |
|
| 166 |
## Citation
|
|
|
|
| 9 |
- semistandard-tableaux
|
| 10 |
- reverse-plane-partitions
|
| 11 |
- hillman-grassl
|
| 12 |
+
- cylindric-plane-partitions
|
| 13 |
+
- growth-diagrams
|
| 14 |
- transformer
|
| 15 |
- pytorch
|
| 16 |
datasets:
|
|
|
|
| 21 |
|
| 22 |
# RSK Transformer
|
| 23 |
|
| 24 |
+
A transformer that learns **inverse combinatorial bijections** β the Robinson-Schensted-Knuth correspondence (permutations and matrices), the Hillman-Grassl correspondence (reverse plane partitions), and the cylindric growth diagram bijection (cylindric plane partitions). The same architecture handles all tasks without modification.
|
| 25 |
|
| 26 |
+
Achieves **100% exact-match accuracy** on held-out test data for permutations at n=10, **99.99%** at n=15 (1.3 trillion permutations), **100%** on 3Γ3 matrix RSK, **100%** on reverse plane partitions of shape (4,3,2,1), and **100%** on cylindric plane partitions β significantly improving on the [PNNL ML4AlgComb benchmark](https://github.com/pnnl/ML4AlgComb/tree/master/rsk). Scales to 5Γ5 matrices (96.8% exact match on a space of ~10ΒΉβ΄).
|
| 27 |
|
| 28 |
π **Paper**: [paper.pdf](paper.pdf)
|
| 29 |
π» **Code**: [github.com/RaggedR/rsk-transformer](https://github.com/RaggedR/rsk-transformer)
|
| 30 |
+
π **Thesis**: [Langer (2013) β Cylindric plane partitions, Lambda determinants, Commutants in semicircular systems](https://arxiv.org/abs/2110.12629) β the mathematical foundation for the cylindric growth diagram bijection (Β§4.2β4.3) and generalized RSK via Fomin growth diagrams (Β§2.1β2.2)
|
| 31 |
|
| 32 |
## Results
|
| 33 |
|
|
|
|
| 78 |
|
| 79 |
The Hillman-Grassl bijection is fundamentally different from RSK β it involves zigzag paths through the Young diagram rather than Schensted insertion β yet the same transformer architecture learns it to near-perfect accuracy. Tall shapes converge slower (36 epochs vs 17-23) because longer zigzag paths create longer-range dependencies.
|
| 80 |
|
| 81 |
+
### Experiment 4: Cylindric Plane Partitions (Growth Diagrams)
|
| 82 |
+
|
| 83 |
+
Given a cylindric plane partition (CPP) with binary profile Ο, recover the base partition Ξ³ and the ALCD face labels via the inverse cylindric growth diagram bijection. This uses the **Burge local rule** applied recursively through a cylindric growth diagram, as described in [Langer (2013), Β§4.2β4.3](https://arxiv.org/abs/2110.12629). **Same model architecture**.
|
| 84 |
+
|
| 85 |
+
| Profile Ο | T | ALCD labels | Training data | Test exact match | Per-position | Best epoch |
|
| 86 |
+
|-----------|---|-------------|--------------|-----------------|-------------|------------|
|
| 87 |
+
| (1,0,1,0) | 4 | 3 | 500,000 | **100.00%** | **100.00%** | 2 |
|
| 88 |
+
| (1,0,1,0,0) | 5 | 5 | 500,000 | **100.00%** | **100.00%** | 7 |
|
| 89 |
+
| (1,0,1,0,1,0) | 6 | 6 | 500,000 | **100.00%** | **100.00%** | 3 |
|
| 90 |
+
| (1,0,1,0,1,0,1,0) | 8 | 10 | 500,000 | **99.98%** | **100.00%** | 9 |
|
| 91 |
+
|
| 92 |
+
The cylindric bijection is qualitatively different from all previous experiments: there is no direct closed-form algorithm. The bijection is defined implicitly by the Burge local rule applied at each face of the cylindric growth diagram. The model must learn to invert a recursive process (the π_i composition from [Langer 2013, Β§4.2](https://arxiv.org/abs/2110.12629)) that peels off one ALCD label at each step by solving a local Burge equation. Despite this complexity, the transformer achieves 100% on all tested profiles.
|
| 93 |
+
|
| 94 |
## Key Idea: Structured 2D Token Embeddings
|
| 95 |
|
| 96 |
Previous work encoded tableaux as flat bracket strings, destroying 2D geometry. We encode each tableau entry as a token with four learned embeddings:
|
|
|
|
| 140 |
| `checkpoints/encoder_rpp_6x4x2_m4/best.pt` | RSKEncoder on RPP shape (6,4,2), max_entry=4 | ~1.2M |
|
| 141 |
| `checkpoints/encoder_rpp_2x2x2x2x2x1_m4/best.pt` | RSKEncoder on RPP shape (2,2,2,2,2,1), max_entry=4 | ~1.2M |
|
| 142 |
|
| 143 |
+
### Experiment 4: Cylindric Plane Partitions
|
| 144 |
+
|
| 145 |
+
| File | Description | Parameters |
|
| 146 |
+
|------|-------------|-----------|
|
| 147 |
+
| `checkpoints/encoder_cyl_1010_m3/best.pt` | RSKEncoder on CPP profile (1,0,1,0), max_label=3 | ~1.2M |
|
| 148 |
+
| `checkpoints/encoder_cyl_10100_m3/best.pt` | RSKEncoder on CPP profile (1,0,1,0,0), max_label=3 | ~1.2M |
|
| 149 |
+
| `checkpoints/encoder_cyl_101010_m3/best.pt` | RSKEncoder on CPP profile (1,0,1,0,1,0), max_label=3 | ~1.2M |
|
| 150 |
+
|
| 151 |
### Loading a checkpoint
|
| 152 |
|
| 153 |
```python
|
|
|
|
| 185 |
--source sample --train-size 500000
|
| 186 |
python train.py --model encoder --task rpp --shape 6,4,2 --max-entry 4 \
|
| 187 |
--source sample --train-size 500000
|
| 188 |
+
|
| 189 |
+
# --- Experiment 4: Cylindric Plane Partitions ---
|
| 190 |
+
python train.py --model encoder --task cylindric --profile 1010 --max-label 3 \
|
| 191 |
+
--source sample --train-size 500000
|
| 192 |
+
python train.py --model encoder --task cylindric --profile 101010 --max-label 3 \
|
| 193 |
+
--source sample --train-size 500000
|
| 194 |
```
|
| 195 |
|
| 196 |
## Citation
|