nur-dev's picture
Add PAT-ER usable model (base mode + interface_mode)
d23603b verified
|
Raw
History Blame Contribute Delete
5.33 kB
metadata
license: apache-2.0
base_model: Qwen/Qwen3-0.6B
library_name: pytorch
pipeline_tag: text-generation
tags:
  - reasoning
  - function-calling
  - side-state
  - research

PAT-ER — Primitive-Augmented Transformer with Event-Role Stream

PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two learned side-state streams — an event-role stream and a primitive stream — in addition to the normal token stream. This checkpoint is a research prototype warm-started from Qwen/Qwen3-0.6B with the PAT-ER side-state, then given a decoupled product-generation interface.

Research prototype. PAT-ER proposes and predicts; it does not prove, certify, or guarantee. Not safety-certified. Not a theorem prover.

Intended use

Research / reviewer evaluation of (a) side-state reasoning signals (primitive class, role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation (evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and Hermes-style tool calls).

Two modes (same checkpoint)

A forward(..., interface_mode=bool) flag selects the mode:

  • base mode (interface_mode=False, default) — side-state / matrix behavior. Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support, role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read side-state predictions.
  • interface_mode=True — product-format generation. Runs mode-gated copies of the top-2 decoder blocks whose adapted attention produces tool calls and structured answers. interface_mode=True is required for product-format generation; base mode does not emit tool calls.

The interface is decoupled: base mode is byte-identical to the underlying Condition-D model, so product generation costs zero side-state.

Key measured results

Architecture contribution (8 seeds) — adding PAT-ER side-state registers to the same Qwen3-0.6B backbone (Condition B → C):

Metric Δ (B→C) 95% CI
primitive macro-F1 +0.209 [+0.182, +0.237]
role-to-primitive macro-F1 +0.090 [+0.074, +0.110]

Usable interface (3 seeds, 242 held-out prompts, interface_mode=True):

Metric Result
Hermes tool-call parse 0.952
tool arguments exact (no hallucinated args) 0.981
JSON valid / keys 1.000
IDK precision/recall/F1 1.000
base-mode primitive / r2p / LM unchanged vs Condition D

Known limit: unseen tool-name exact accuracy is 0.886 — semantic substitution and multi-token truncation on a minority of unseen names. Robust schema-grounded function calling is supported; perfect tool-name copying is not.

Files

File Description
model.safetensors weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers)
config.json PATERConfig (incl. interface_adapt_layers: 2)
pat_er/ minimal source package needed to load the custom architecture
tokenizer.json, tokenizer_config.json, chat_template.jinja, pater_manifest.json Qwen3 tokenizer extended with the PAT-ER token spec
example_usage.py load + run example (base mode and interface_mode)
eval_prompts.jsonl synthetic held-out product-eval fixture (generated; no external dataset text)

Usage

pip install torch safetensors transformers
python3 example_usage.py
import json, torch
from safetensors.torch import load_file
from transformers import AutoTokenizer
from pat_er import PATERConfig, PATERForCausalLM

cfg = json.load(open("config.json"))
config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
model = PATERForCausalLM(config)
model.load_state_dict(load_file("model.safetensors"), strict=False)  # lm_head is re-tied
model.eval()
tok = AutoTokenizer.from_pretrained(".")

# interface_mode=True -> tool call;  interface_mode=False -> side-state (Condition D)
out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)

Datasets

Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic primitive data) through local converters. Converted external datasets are not redistributed here. The included eval_prompts.jsonl is a synthetic, generated fixture (hand-written reasoning templates + synthetic tool schemas) and contains no external dataset text.

Limitations & safety

  • Research prototype; not safety-certified; not a theorem prover.
  • Contradiction/deep reasoning is partial; primitive/support labels are predictions.
  • Tool calls are proposals. A well-formed, name-grounded <tool_call> is not approval to execute — validate the tool name and arguments against the real schema before any execution, and keep a human/verifier in the loop for irreversible actions.
  • 760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).

Project

Source, training, and evaluation code: https://github.com/Pronto-Sage/primitive-augmented-transformer

Base model: Qwen/Qwen3-0.6B (Apache-2.0). This derivative is released under Apache-2.0.