Add PAT-ER usable model (base mode + interface_mode)

d23603b verified 27 days ago

5.33 kB

license: apache-2.0
base_model: Qwen/Qwen3-0.6B
library_name: pytorch
pipeline_tag: text-generation
tags:
  - reasoning
  - function-calling
  - side-state
  - research

PAT-ER — Primitive-Augmented Transformer with Event-Role Stream

PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two learned side-state streams — an event-role stream and a primitive stream — in addition to the normal token stream. This checkpoint is a research prototype warm-started from Qwen/Qwen3-0.6B with the PAT-ER side-state, then given a decoupled product-generation interface.

Research prototype. PAT-ER proposes and predicts; it does not prove, certify, or guarantee. Not safety-certified. Not a theorem prover.

Intended use

Research / reviewer evaluation of (a) side-state reasoning signals (primitive class, role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation (evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and Hermes-style tool calls).

Two modes (same checkpoint)

A forward(..., interface_mode=bool) flag selects the mode:

base mode (interface_mode=False, default) — side-state / matrix behavior. Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support, role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read side-state predictions.
interface_mode=True — product-format generation. Runs mode-gated copies of the top-2 decoder blocks whose adapted attention produces tool calls and structured answers. interface_mode=True is required for product-format generation; base mode does not emit tool calls.

The interface is decoupled: base mode is byte-identical to the underlying Condition-D model, so product generation costs zero side-state.

Key measured results

Architecture contribution (8 seeds) — adding PAT-ER side-state registers to the same Qwen3-0.6B backbone (Condition B → C):

Metric	Δ (B→C)	95% CI
primitive macro-F1	+0.209	[+0.182, +0.237]
role-to-primitive macro-F1	+0.090	[+0.074, +0.110]

Usable interface (3 seeds, 242 held-out prompts, interface_mode=True):

Metric	Result
Hermes tool-call parse	0.952
tool arguments exact (no hallucinated args)	0.981
JSON valid / keys	1.000
IDK precision/recall/F1	1.000
base-mode primitive / r2p / LM	unchanged vs Condition D

Known limit: unseen tool-name exact accuracy is 0.886 — semantic substitution and multi-token truncation on a minority of unseen names. Robust schema-grounded function calling is supported; perfect tool-name copying is not.

Files

File	Description
`model.safetensors`	weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers)
`config.json`	`PATERConfig` (incl. `interface_adapt_layers: 2`)
`pat_er/`	minimal source package needed to load the custom architecture
`tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `pater_manifest.json`	Qwen3 tokenizer extended with the PAT-ER token spec
`example_usage.py`	load + run example (base mode and `interface_mode`)
`eval_prompts.jsonl`	synthetic held-out product-eval fixture (generated; no external dataset text)

Usage

pip install torch safetensors transformers
python3 example_usage.py

import json, torch
from safetensors.torch import load_file
from transformers import AutoTokenizer
from pat_er import PATERConfig, PATERForCausalLM

cfg = json.load(open("config.json"))
config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
model = PATERForCausalLM(config)
model.load_state_dict(load_file("model.safetensors"), strict=False)  # lm_head is re-tied
model.eval()
tok = AutoTokenizer.from_pretrained(".")

# interface_mode=True -> tool call;  interface_mode=False -> side-state (Condition D)
out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)

Datasets

Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic primitive data) through local converters. Converted external datasets are not redistributed here. The included eval_prompts.jsonl is a synthetic, generated fixture (hand-written reasoning templates + synthetic tool schemas) and contains no external dataset text.

Limitations & safety

Research prototype; not safety-certified; not a theorem prover.
Contradiction/deep reasoning is partial; primitive/support labels are predictions.
Tool calls are proposals. A well-formed, name-grounded <tool_call> is not approval to execute — validate the tool name and arguments against the real schema before any execution, and keep a human/verifier in the loop for irreversible actions.
760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).

Project

Source, training, and evaluation code: https://github.com/Pronto-Sage/primitive-augmented-transformer

Base model: Qwen/Qwen3-0.6B (Apache-2.0). This derivative is released under Apache-2.0.