Add PAT-ER usable model (base mode + interface_mode)

d23603b verified 28 days ago

5.33 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B
	library_name: pytorch
	pipeline_tag: text-generation
	tags:
	- reasoning
	- function-calling
	- side-state
	- research
	---

	# PAT-ER — Primitive-Augmented Transformer with Event-Role Stream

	PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two
	learned side-state streams — an event-role stream and a primitive stream — in
	addition to the normal token stream. This checkpoint is a research prototype warm-started
	from [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) with the PAT-ER
	side-state, then given a decoupled product-generation interface.

	> Research prototype. PAT-ER proposes and predicts; it does not prove, certify, or
	> guarantee. Not safety-certified. Not a theorem prover.

	## Intended use

	Research / reviewer evaluation of (a) side-state reasoning signals (primitive class,
	role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation
	(evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and
	Hermes-style tool calls).

	## Two modes (same checkpoint)

	A `forward(..., interface_mode=bool)` flag selects the mode:

	- base mode (`interface_mode=False`, default) — side-state / matrix behavior.
	Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support,
	role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read
	side-state predictions.
	- `interface_mode=True` — product-format generation. Runs mode-gated copies of the
	top-2 decoder blocks whose adapted attention produces tool calls and structured answers.
	`interface_mode=True` is required for product-format generation; base mode does not
	emit tool calls.

	The interface is decoupled: base mode is byte-identical to the underlying Condition-D
	model, so product generation costs zero side-state.

	## Key measured results

	Architecture contribution (8 seeds) — adding PAT-ER side-state registers to the same
	Qwen3-0.6B backbone (Condition B → C):

	\| Metric \| Δ (B→C) \| 95% CI \|
	\|---\|---\|---\|
	\| primitive macro-F1 \| +0.209 \| [+0.182, +0.237] \|
	\| role-to-primitive macro-F1 \| +0.090 \| [+0.074, +0.110] \|

	Usable interface (3 seeds, 242 held-out prompts, `interface_mode=True`):

	\| Metric \| Result \|
	\|---\|---\|
	\| Hermes tool-call parse \| 0.952 \|
	\| tool arguments exact (no hallucinated args) \| 0.981 \|
	\| JSON valid / keys \| 1.000 \|
	\| IDK precision/recall/F1 \| 1.000 \|
	\| base-mode primitive / r2p / LM \| unchanged vs Condition D \|

	Known limit: unseen tool-name exact accuracy is 0.886 — semantic substitution
	and multi-token truncation on a minority of unseen names. Robust schema-grounded function
	calling is supported; perfect tool-name copying is not.

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `model.safetensors` \| weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers) \|
	\| `config.json` \| `PATERConfig` (incl. `interface_adapt_layers: 2`) \|
	\| `pat_er/` \| minimal source package needed to load the custom architecture \|
	\| `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `pater_manifest.json` \| Qwen3 tokenizer extended with the PAT-ER token spec \|
	\| `example_usage.py` \| load + run example (base mode and `interface_mode`) \|
	\| `eval_prompts.jsonl` \| synthetic held-out product-eval fixture (generated; no external dataset text) \|

	## Usage

	```bash
	pip install torch safetensors transformers
	python3 example_usage.py
	```

	```python
	import json, torch
	from safetensors.torch import load_file
	from transformers import AutoTokenizer
	from pat_er import PATERConfig, PATERForCausalLM

	cfg = json.load(open("config.json"))
	config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
	model = PATERForCausalLM(config)
	model.load_state_dict(load_file("model.safetensors"), strict=False) # lm_head is re-tied
	model.eval()
	tok = AutoTokenizer.from_pretrained(".")

	# interface_mode=True -> tool call; interface_mode=False -> side-state (Condition D)
	out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)
	```

	## Datasets

	Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic
	primitive data) through local converters. **Converted external datasets are not
	redistributed here.** The included `eval_prompts.jsonl` is a synthetic, generated fixture
	(hand-written reasoning templates + synthetic tool schemas) and contains no external
	dataset text.

	## Limitations & safety

	- Research prototype; not safety-certified; not a theorem prover.
	- Contradiction/deep reasoning is partial; primitive/support labels are predictions.
	- Tool calls are proposals. A well-formed, name-grounded `<tool_call>` is not approval
	to execute — validate the tool name and arguments against the real schema before any
	execution, and keep a human/verifier in the loop for irreversible actions.
	- 760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).

	## Project

	Source, training, and evaluation code:
	https://github.com/Pronto-Sage/primitive-augmented-transformer

	Base model: Qwen/Qwen3-0.6B (Apache-2.0). This derivative is released under Apache-2.0.