ACDRepo
/

PermuFormer

 ---
 license: cc-by-4.0
+pipeline_tag: text-generation
+tags:
+- math
+- combinatorics
+- permutations
+- algebraic-combinatorics
+- llama
+- causal-lm
 ---
+# PermuFormer
+PermuFormer is a small Llama-style causal language model trained on symbolic permutation tasks from algebraic combinatorics. It is intended as a specialist base model for permutation representation, reasoning, and finetuning experiments rather than as a general natural-language assistant.
+The model operates on a compact whitespace-tokenized vocabulary for permutations. Prompts are formulaic equations: the left side specifies a permutation task and generation begins after the `=` token.
+## Model Details
+- **Architecture:** `LlamaForCausalLM`
+- **Parameters:** about 75.7M
+- **Layers:** 12
+- **Hidden size:** 768
+- **Attention heads:** 12 query heads, 4 key/value heads
+- **MLP intermediate size:** 2048
+- **Activation:** SiLU/SwiGLU
+- **Position encoding:** RoPE, theta 10000
+- **Vocabulary size:** 186
+- **Context length used by tokenizer:** 1000 tokens
+- **Checkpoint:** `step_2600000`
+## Training Data
+PermuFormer was trained autoregressively on synthetic permutation examples generated with exact combinatorial algorithms. The paper describes a dataset of 39.8M instances, approximately 2.66B tokens, over the symmetric groups `S_2` through `S_11`.
+Training tasks cover three broad families:
+- **Translation between encodings:** one-line notation, cycle notation, reduced Coxeter expressions, RSK tableaux, inversion vectors, and Lehmer codes.
+- **Permutation statistics and properties:** length, descents, fixed points, sign/parity, cycle type, RSK shape, pattern avoidance, longest increasing/decreasing subsequences, and related statistics.
+- **Algebraic operations and comparisons:** product/composition, inverse, powers, conjugation, commutator, relative products, multiplication by simple transpositions, complement, reverse, descent tests, and Bruhat order.
+Some targets include computational witnesses before the final answer, for example inversion lists before a length answer or pattern witnesses before an avoidance answer.
+## Usage
+Use deterministic decoding for most evaluation-style tasks. Make sure special token IDs come from the tokenizer.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "YOUR_ORG/permuformer"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+model.eval()
+prompt = (
+    "<|endoftext|> n3 "
+    "1linebegin [ 3 , 1 , 2 ] 1lineend "
+    "in cyclenotationmake ="
+)
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    output_ids = model.generate(
+        **inputs,
+        max_new_tokens=80,
+        do_sample=False,
+        eos_token_id=tokenizer.eos_token_id,
+        pad_token_id=tokenizer.pad_token_id,
+    )
+print(tokenizer.decode(output_ids[0], skip_special_tokens=False))
+```
+### Prompt Format
+All tokens are separated by spaces. Multi-digit integers, delimiters, and task names are individual tokens. A typical example starts with `<|endoftext|>`, then a size token such as `n7`, then the task expression, then `=`.
+Translation example:
+```text
+<|endoftext|> n3 1linebegin [ 3 , 1 , 2 ] 1lineend in cyclenotationmake =
+```
+Property example:
+```text
+<|endoftext|> n3 1linebegin [ 3 , 2 , 1 ] 1lineend property lengthmake =
+```
+Algebraic operation example:
+```text
+<|endoftext|> n3 1linebegin [ 2 , 1 , 3 ] 1lineend inversemake =
+```
+## Evaluation Notes
+The training code evaluates by exact match on the generated right-hand side after `=`. The local training log for this repository reports, at step 2,522,000 on a 2,560-example stratified evaluation sample:
+- Overall exact match: **98.44%**
+- Translation: **97.78%**
+- Property/statistic tasks: **99.17%**
+- Algebraic tasks: **98.36%**
+These figures are from the local log and should be treated as checkpoint-adjacent repository metadata, not a full benchmark report for every downstream setting.
+The paper also reports that PermuFormer is substantially more accurate than frontier general-purpose LLMs on a small held-out sample from the model's symbolic test distribution, while noting that the comparison is imperfect because PermuFormer was trained directly in this syntax.
+## Finetuning
+PermuFormer is designed to be finetuned on specialized permutation tasks. Experiments in the paper include:
+- 231-avoidance and 2143-avoidance
+- mHeight
+- Schubert polynomial structure constants
+- Kazhdan-Lusztig polynomial degree prediction
+The repository's finetuning scripts compare starting from this pretrained checkpoint with training the same architecture from scratch.
+## Limitations
+- This is a specialist symbolic model. It expects the exact whitespace-tokenized syntax used during training and is brittle to natural-language paraphrases or malformed prompts.
+- The model is trained on permutations of sizes represented in the training data, primarily `S_2` through `S_11`; behavior outside that regime is not guaranteed.
+- Exact-match accuracy depends on canonical output formatting. Some mathematical tasks may have multiple valid answers, but evaluation expects the chosen canonical form.
+- The model focuses on permutations. It does not natively handle broader combinatorial structures such as arbitrary graphs or partitions unless encoded through the supported task syntax.
+- Outputs should be verified by exact combinatorial software for research-critical use.
+## Citation
+If you use this model, please cite the accompanying PermuFormer paper once citation details are available.