TS-Proof-Ranker-v1
A pairwise quality ranker for first-order Horn proof chains. Given
(axioms, goal, candidate_chain) it returns a scalar score; chains that
the symbolic verifier accepts and that are closer to the shortest derivation
score higher.
This sits one layer above
BoggersTheFish/ts-proof-ranker-v0:
- v0 answers can this goal be derived at all?
- v1 answers of these candidate proofs, which is best?
The model is a thin BiGRU encoder with concatenated mean+max pooling, trained pairwise (Bradley–Terry) on synthetic Horn problems. It is not a standalone prover — final derivations must still compile through the symbolic kernel.
Architecture
- Embedding(vocab=39, d=96)
- BiGRU(input=96, hidden=96, 1 layer, dropout=0.1)
- Concatenated mean+max pooling over time (→ 4·hidden = 384 features)
- Linear(384 → 96) → ReLU → Dropout → Linear(96 → 1)
- ~152k parameters
Test results
{
"n_pairs": 501,
"pairwise_acc": 0.9840319361277445,
"by_kind": {
"shuffled": 0.9761904761904762,
"truncated": 1.0,
"bloated": 1.0,
"wrong_const": 1.0,
"corrupted": 0.95
},
"by_depth": {
"1": 1.0,
"2": 0.9619047619047619,
"3": 0.98989898989899,
"4": 0.9829059829059829,
"5": 0.9861111111111112
},
"spearman_score_vs_quality": 0.7041458016153326
}
Training
{
"task": "synthetic_first_order_horn_proof_path_ranking",
"n_parameters": 152545,
"train_examples": 6,
"history_summary": [
{
"epoch": 1,
"train_acc": 0.6067,
"val_acc": 0.9242,
"val_loss": 0.5715
},
{
"epoch": 2,
"train_acc": 0.8617,
"val_acc": 0.8962,
"val_loss": 0.1905
},
{
"epoch": 3,
"train_acc": 0.9033,
"val_acc": 0.9202,
"val_loss": 0.1523
},
{
"epoch": 4,
"train_acc": 0.931,
"val_acc": 0.9401,
"val_loss": 0.1216
},
{
"epoch": 5,
"train_acc": 0.9617,
"val_acc": 0.9701,
"val_loss": 0.0671
},
{
"epoch": 6,
"train_acc": 0.9627,
"val_acc": 0.984,
"val_loss": 0.0538
}
]
}
Input format
The tokenizer extends v0's vocabulary with four chain-aware special tokens:
<axiom>, <goal>, <chain>, <step>. The full input is
<axiom> ax1 <axiom> ax2 ... <goal> goal_atom <chain> step1 <step> step2 ...
See ts_proof/env/serialize.py in the source repo for the canonical
serializer.
Usage
import torch, json
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "pytorch_model.bin")
config = json.loads(open(hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "config.json")).read())
tokens = json.loads(open(hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "tokenizer.json")).read())["token_to_id"]
# Build the model — see source repo for `V1Ranker` definition.
# from ts_proof.models.v1_ranker import V1Config, V1Ranker
# m = V1Ranker(V1Config(**config))
# m.load_state_dict(torch.load(ckpt_path, map_location="cpu", weights_only=True)["state_dict"])
Limitations
- Trained on synthetic 1-arity Horn problems with depths 1–5; behavior on multi-arity, multi-variable, or longer chains is untested.
- Pairwise margin training optimises ordering of candidates sharing the same
(axioms, goal). Absolute scores are not calibrated probabilities. - Distinguishing single-token corruptions (
corruptedbad-kind) is the weakest mode — about 95% pairwise accuracy vs. ~100% on the structural bad-kinds. v2 (local-tension detector) is intended to address this.
- Downloads last month
- 20