nur-dev's picture
Add PAT-ER usable model (base mode + interface_mode)
d23603b verified
|
Raw
History Blame Contribute Delete
5.33 kB
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B
library_name: pytorch
pipeline_tag: text-generation
tags:
- reasoning
- function-calling
- side-state
- research
---
# PAT-ER β€” Primitive-Augmented Transformer with Event-Role Stream
PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two
learned **side-state** streams β€” an **event-role stream** and a **primitive stream** β€” in
addition to the normal token stream. This checkpoint is a research prototype warm-started
from **[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)** with the PAT-ER
side-state, then given a decoupled product-generation interface.
> Research prototype. PAT-ER proposes and predicts; it does **not** prove, certify, or
> guarantee. Not safety-certified. Not a theorem prover.
## Intended use
Research / reviewer evaluation of (a) side-state reasoning signals (primitive class,
role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation
(evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and
Hermes-style tool calls).
## Two modes (same checkpoint)
A `forward(..., interface_mode=bool)` flag selects the mode:
- **base mode** (`interface_mode=False`, default) β€” **side-state / matrix behavior**.
Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support,
role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read
side-state predictions.
- **`interface_mode=True`** β€” **product-format generation**. Runs mode-gated copies of the
top-2 decoder blocks whose adapted attention produces tool calls and structured answers.
**`interface_mode=True` is required for product-format generation**; base mode does not
emit tool calls.
The interface is **decoupled**: base mode is byte-identical to the underlying Condition-D
model, so product generation costs **zero** side-state.
## Key measured results
**Architecture contribution** (8 seeds) β€” adding PAT-ER side-state registers to the same
Qwen3-0.6B backbone (Condition B β†’ C):
| Metric | Δ (B→C) | 95% CI |
|---|---|---|
| primitive macro-F1 | **+0.209** | [+0.182, +0.237] |
| role-to-primitive macro-F1 | **+0.090** | [+0.074, +0.110] |
**Usable interface** (3 seeds, 242 held-out prompts, `interface_mode=True`):
| Metric | Result |
|---|---|
| Hermes tool-call parse | **0.952** |
| tool arguments exact (no hallucinated args) | **0.981** |
| JSON valid / keys | **1.000** |
| IDK precision/recall/F1 | **1.000** |
| base-mode primitive / r2p / LM | **unchanged vs Condition D** |
**Known limit:** unseen tool-**name** exact accuracy is **0.886** β€” semantic substitution
and multi-token truncation on a minority of unseen names. Robust schema-grounded function
calling is supported; **perfect tool-name copying is not**.
## Files
| File | Description |
|---|---|
| `model.safetensors` | weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers) |
| `config.json` | `PATERConfig` (incl. `interface_adapt_layers: 2`) |
| `pat_er/` | minimal source package needed to load the custom architecture |
| `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `pater_manifest.json` | Qwen3 tokenizer extended with the PAT-ER token spec |
| `example_usage.py` | load + run example (base mode and `interface_mode`) |
| `eval_prompts.jsonl` | synthetic held-out product-eval fixture (generated; no external dataset text) |
## Usage
```bash
pip install torch safetensors transformers
python3 example_usage.py
```
```python
import json, torch
from safetensors.torch import load_file
from transformers import AutoTokenizer
from pat_er import PATERConfig, PATERForCausalLM
cfg = json.load(open("config.json"))
config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
model = PATERForCausalLM(config)
model.load_state_dict(load_file("model.safetensors"), strict=False) # lm_head is re-tied
model.eval()
tok = AutoTokenizer.from_pretrained(".")
# interface_mode=True -> tool call; interface_mode=False -> side-state (Condition D)
out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)
```
## Datasets
Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic
primitive data) through local converters. **Converted external datasets are not
redistributed here.** The included `eval_prompts.jsonl` is a synthetic, generated fixture
(hand-written reasoning templates + synthetic tool schemas) and contains no external
dataset text.
## Limitations & safety
- Research prototype; **not safety-certified**; **not a theorem prover**.
- Contradiction/deep reasoning is partial; primitive/support labels are *predictions*.
- **Tool calls are proposals.** A well-formed, name-grounded `<tool_call>` is not approval
to execute β€” validate the tool name and arguments against the real schema before any
execution, and keep a human/verifier in the loop for irreversible actions.
- 760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).
## Project
Source, training, and evaluation code:
**https://github.com/Pronto-Sage/primitive-augmented-transformer**
Base model: **Qwen/Qwen3-0.6B** (Apache-2.0). This derivative is released under Apache-2.0.