| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-0.6B |
| library_name: pytorch |
| pipeline_tag: text-generation |
| tags: |
| - reasoning |
| - function-calling |
| - side-state |
| - research |
| --- |
| |
| # PAT-ER β Primitive-Augmented Transformer with Event-Role Stream |
|
|
| PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two |
| learned **side-state** streams β an **event-role stream** and a **primitive stream** β in |
| addition to the normal token stream. This checkpoint is a research prototype warm-started |
| from **[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)** with the PAT-ER |
| side-state, then given a decoupled product-generation interface. |
|
|
| > Research prototype. PAT-ER proposes and predicts; it does **not** prove, certify, or |
| > guarantee. Not safety-certified. Not a theorem prover. |
|
|
| ## Intended use |
|
|
| Research / reviewer evaluation of (a) side-state reasoning signals (primitive class, |
| role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation |
| (evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and |
| Hermes-style tool calls). |
|
|
| ## Two modes (same checkpoint) |
|
|
| A `forward(..., interface_mode=bool)` flag selects the mode: |
|
|
| - **base mode** (`interface_mode=False`, default) β **side-state / matrix behavior**. |
| Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support, |
| role-to-primitive, IDK, β¦) and the LM are the Condition-D path. Use this to read |
| side-state predictions. |
| - **`interface_mode=True`** β **product-format generation**. Runs mode-gated copies of the |
| top-2 decoder blocks whose adapted attention produces tool calls and structured answers. |
| **`interface_mode=True` is required for product-format generation**; base mode does not |
| emit tool calls. |
|
|
| The interface is **decoupled**: base mode is byte-identical to the underlying Condition-D |
| model, so product generation costs **zero** side-state. |
|
|
| ## Key measured results |
|
|
| **Architecture contribution** (8 seeds) β adding PAT-ER side-state registers to the same |
| Qwen3-0.6B backbone (Condition B β C): |
|
|
| | Metric | Ξ (BβC) | 95% CI | |
| |---|---|---| |
| | primitive macro-F1 | **+0.209** | [+0.182, +0.237] | |
| | role-to-primitive macro-F1 | **+0.090** | [+0.074, +0.110] | |
|
|
| **Usable interface** (3 seeds, 242 held-out prompts, `interface_mode=True`): |
|
|
| | Metric | Result | |
| |---|---| |
| | Hermes tool-call parse | **0.952** | |
| | tool arguments exact (no hallucinated args) | **0.981** | |
| | JSON valid / keys | **1.000** | |
| | IDK precision/recall/F1 | **1.000** | |
| | base-mode primitive / r2p / LM | **unchanged vs Condition D** | |
|
|
| **Known limit:** unseen tool-**name** exact accuracy is **0.886** β semantic substitution |
| and multi-token truncation on a minority of unseen names. Robust schema-grounded function |
| calling is supported; **perfect tool-name copying is not**. |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `model.safetensors` | weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers) | |
| | `config.json` | `PATERConfig` (incl. `interface_adapt_layers: 2`) | |
| | `pat_er/` | minimal source package needed to load the custom architecture | |
| | `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `pater_manifest.json` | Qwen3 tokenizer extended with the PAT-ER token spec | |
| | `example_usage.py` | load + run example (base mode and `interface_mode`) | |
| | `eval_prompts.jsonl` | synthetic held-out product-eval fixture (generated; no external dataset text) | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install torch safetensors transformers |
| python3 example_usage.py |
| ``` |
|
|
| ```python |
| import json, torch |
| from safetensors.torch import load_file |
| from transformers import AutoTokenizer |
| from pat_er import PATERConfig, PATERForCausalLM |
| |
| cfg = json.load(open("config.json")) |
| config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__}) |
| model = PATERForCausalLM(config) |
| model.load_state_dict(load_file("model.safetensors"), strict=False) # lm_head is re-tied |
| model.eval() |
| tok = AutoTokenizer.from_pretrained(".") |
| |
| # interface_mode=True -> tool call; interface_mode=False -> side-state (Condition D) |
| out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True) |
| ``` |
|
|
| ## Datasets |
|
|
| Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic |
| primitive data) through local converters. **Converted external datasets are not |
| redistributed here.** The included `eval_prompts.jsonl` is a synthetic, generated fixture |
| (hand-written reasoning templates + synthetic tool schemas) and contains no external |
| dataset text. |
|
|
| ## Limitations & safety |
|
|
| - Research prototype; **not safety-certified**; **not a theorem prover**. |
| - Contradiction/deep reasoning is partial; primitive/support labels are *predictions*. |
| - **Tool calls are proposals.** A well-formed, name-grounded `<tool_call>` is not approval |
| to execute β validate the tool name and arguments against the real schema before any |
| execution, and keep a human/verifier in the loop for irreversible actions. |
| - 760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER). |
|
|
| ## Project |
|
|
| Source, training, and evaluation code: |
| **https://github.com/Pronto-Sage/primitive-augmented-transformer** |
|
|
| Base model: **Qwen/Qwen3-0.6B** (Apache-2.0). This derivative is released under Apache-2.0. |
|
|