TS-Proof-Ranker-v2

A local tension detector for first-order Horn proof chains. Given (axioms, candidate_chain), it emits one tension score per chain step and a sequence-level "any failure" score.

This is the v2 layer above:

The model is a compact BiGRU token tagger trained on synthetic chains where one canonical proof step is replaced with a non-derivable atom. The symbolic verifier supplies the ground-truth first failure position.

Architecture

  • Embedding(vocab=39, d=32)
  • BiGRU(input=32, hidden=32, 1 layer, dropout=0.1)
  • Token MLP head trained at chain-step start positions
  • Mean+max pooled MLP head for "any failure"
  • 20194 parameters

Test results

{
  "n_examples": 1000,
  "step_acc": 0.6490479317137229,
  "any_acc": 0.673,
  "fail_pos_acc": 0.46,
  "by_kind_any_acc": {
    "clean": 0.786,
    "corrupted_step": 0.56
  },
  "by_depth_any_acc": {
    "1": 0.532608695652174,
    "2": 0.7164948453608248,
    "3": 0.7376237623762376,
    "4": 0.6637931034482759,
    "5": 0.7074468085106383
  }
}

Training

{
  "task": "synthetic_first_order_horn_failed_step_detection",
  "n_parameters": 20194,
  "history_summary": [
    {
      "epoch": 1,
      "val_step_acc": 0.605,
      "val_any_acc": 0.502,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.2268
    },
    {
      "epoch": 2,
      "val_step_acc": 0.6155,
      "val_any_acc": 0.5,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1931
    },
    {
      "epoch": 3,
      "val_step_acc": 0.6201,
      "val_any_acc": 0.503,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1916
    },
    {
      "epoch": 4,
      "val_step_acc": 0.6102,
      "val_any_acc": 0.567,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1908
    },
    {
      "epoch": 5,
      "val_step_acc": 0.6342,
      "val_any_acc": 0.544,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1864
    },
    {
      "epoch": 6,
      "val_step_acc": 0.628,
      "val_any_acc": 0.581,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1827
    },
    {
      "epoch": 7,
      "val_step_acc": 0.6122,
      "val_any_acc": 0.581,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1781
    },
    {
      "epoch": 8,
      "val_step_acc": 0.6444,
      "val_any_acc": 0.645,
      "val_fail_pos_acc": 0.478,
      "val_loss": 1.1665
    }
  ]
}

Input format

v2 reuses v1's tokenizer and omits the goal:

<axiom> ax1 <axiom> ax2 ... <chain> step1 <step> step2 ...

See ts_proof/env/serialize.py for the canonical serializer and step-position mapping.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support