TS-Proof-Ranker-v1

A pairwise quality ranker for first-order Horn proof chains. Given (axioms, goal, candidate_chain) it returns a scalar score; chains that the symbolic verifier accepts and that are closer to the shortest derivation score higher.

This sits one layer above BoggersTheFish/ts-proof-ranker-v0:

v0 answers can this goal be derived at all?
v1 answers of these candidate proofs, which is best?

The model is a thin BiGRU encoder with concatenated mean+max pooling, trained pairwise (Bradley–Terry) on synthetic Horn problems. It is not a standalone prover — final derivations must still compile through the symbolic kernel.

Architecture

Embedding(vocab=39, d=96)
BiGRU(input=96, hidden=96, 1 layer, dropout=0.1)
Concatenated mean+max pooling over time (→ 4·hidden = 384 features)
Linear(384 → 96) → ReLU → Dropout → Linear(96 → 1)
~152k parameters

Test results

{
  "n_pairs": 501,
  "pairwise_acc": 0.9840319361277445,
  "by_kind": {
    "shuffled": 0.9761904761904762,
    "truncated": 1.0,
    "bloated": 1.0,
    "wrong_const": 1.0,
    "corrupted": 0.95
  },
  "by_depth": {
    "1": 1.0,
    "2": 0.9619047619047619,
    "3": 0.98989898989899,
    "4": 0.9829059829059829,
    "5": 0.9861111111111112
  },
  "spearman_score_vs_quality": 0.7041458016153326
}

Training

{
  "task": "synthetic_first_order_horn_proof_path_ranking",
  "n_parameters": 152545,
  "train_examples": 6,
  "history_summary": [
    {
      "epoch": 1,
      "train_acc": 0.6067,
      "val_acc": 0.9242,
      "val_loss": 0.5715
    },
    {
      "epoch": 2,
      "train_acc": 0.8617,
      "val_acc": 0.8962,
      "val_loss": 0.1905
    },
    {
      "epoch": 3,
      "train_acc": 0.9033,
      "val_acc": 0.9202,
      "val_loss": 0.1523
    },
    {
      "epoch": 4,
      "train_acc": 0.931,
      "val_acc": 0.9401,
      "val_loss": 0.1216
    },
    {
      "epoch": 5,
      "train_acc": 0.9617,
      "val_acc": 0.9701,
      "val_loss": 0.0671
    },
    {
      "epoch": 6,
      "train_acc": 0.9627,
      "val_acc": 0.984,
      "val_loss": 0.0538
    }
  ]
}

Input format

The tokenizer extends v0's vocabulary with four chain-aware special tokens: <axiom>, <goal>, <chain>, <step>. The full input is

<axiom> ax1 <axiom> ax2 ... <goal> goal_atom <chain> step1 <step> step2 ...

See ts_proof/env/serialize.py in the source repo for the canonical serializer.

Usage

import torch, json
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "pytorch_model.bin")
config = json.loads(open(hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "config.json")).read())
tokens = json.loads(open(hf_hub_download("BoggersTheFish/ts-proof-ranker-v1", "tokenizer.json")).read())["token_to_id"]

# Build the model — see source repo for `V1Ranker` definition.
# from ts_proof.models.v1_ranker import V1Config, V1Ranker
# m = V1Ranker(V1Config(**config))
# m.load_state_dict(torch.load(ckpt_path, map_location="cpu", weights_only=True)["state_dict"])

Limitations

Trained on synthetic 1-arity Horn problems with depths 1–5; behavior on multi-arity, multi-variable, or longer chains is untested.
Pairwise margin training optimises ordering of candidates sharing the same (axioms, goal). Absolute scores are not calibrated probabilities.
Distinguishing single-token corruptions (corrupted bad-kind) is the weakest mode — about 95% pairwise accuracy vs. ~100% on the structural bad-kinds. v2 (local-tension detector) is intended to address this.

Downloads last month: 20