Alignment-Aware Neural Architecture (AANA)

AANA is a verifier-grounded runtime architecture for making AI and agent outputs more correctable before they are published, sent, deployed, or used for consequential actions.

It is not a standalone set of neural weights. AANA wraps a base generator or specialist detector with explicit verifier, grounding, correction, and gate components:

S = (f_theta, E_phi, R, Pi_psi, G)

f_theta: base generator, LLM, agent, tool planner, or specialist detector.
E_phi: verifier stack for factual, safety, policy, privacy, and task constraints.
R: retrieval or grounding module for evidence.
Pi_psi: correction policy that can accept, revise, retrieve, ask, refuse, or defer.
G: alignment gate that blocks unsupported final outputs or unsafe actions.

The goal is not to claim perfect alignment. The goal is to make deployment-time correctability, evidence, gating, and auditability explicit.

Head-to-Head Finding

Across two public agent/tool-call sources, the strongest repeated signal is:

AANA improves agent action reliability by combining structured pre-tool-call contracts, verifier gates, and evidence-recovery loops. In these diagnostics, AANA preserves unsafe-action recall while recovering more safe actions than permissive agents, single classifiers, prompt-only guards, LLM judges, or static contract gates.

Summary:

Source	Architecture	Accuracy	Unsafe recall	Safe allow	FP	FN
Qwen traces	Permissive agent	`50.00%`	`0.00%`	`100.00%`	`0`	`180`
Qwen traces	Single classifier	`50.00%`	`100.00%`	`0.00%`	`180`	`0`
Qwen traces	Prompt-only guardrail	`81.67%`	`96.67%`	`66.67%`	`60`	`6`
Qwen traces	LLM-as-judge	`73.33%`	`100.00%`	`46.67%`	`96`	`0`
Qwen traces	Contract gate, no recovery	`92.78%`	`100.00%`	`85.56%`	`26`	`0`
Qwen traces	AANA with recovery	`100.00%`	`100.00%`	`100.00%`	`0`	`0`
Hermes traces	Permissive agent	`50.00%`	`0.00%`	`100.00%`	`0`	`180`
Hermes traces	Single classifier	`50.00%`	`100.00%`	`0.00%`	`180`	`0`
Hermes traces	Prompt-only guardrail	`93.06%`	`97.22%`	`88.89%`	`20`	`5`
Hermes traces	LLM-as-judge	`85.28%`	`99.44%`	`71.11%`	`52`	`1`
Hermes traces	Contract gate, no recovery	`92.22%`	`100.00%`	`84.44%`	`28`	`0`
Hermes traces	AANA with recovery	`100.00%`	`100.00%`	`100.00%`	`0`	`0`

Evidence tiers matter. PIIMB is an official external benchmark submission. The Qwen and Hermes head-to-heads use public datasets with reproducible transforms and policy-derived labels, not human-reviewed safety labels. Local blind action-gate runs are useful development ablations but weaker external validity evidence.

Public summary: https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/aana-head-to-head-findings.md

Try AANA

Use the public Hugging Face Space as the quickest way to try the AANA gate with your own candidate answer/action, evidence, and constraints:

https://huggingface.co/spaces/mindbomber/aana-demo

The demo returns an AANA-style route (accept, revise, ask, defer, or refuse), AIx score, hard blockers, suggested revision/route, and audit summary.

Current Public Benchmark Signals

RAGTruth: Grounded Hallucination Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-ragtruth-grounded-gate

Benchmark: wandb/RAGTruth-processed

Dataset revision: eb4f4b9d1b68eb7092d3e1a61c0cd82d9808737b

Split: test

Examples: 2700

Base path: accept existing model outputs as-is.

AANA path: route low evidence-support outputs to revise.

Path	Unsafe accept rate on hallucinated outputs	Balanced accuracy	Hallucination recall
Base accept-as-is	`1.000000`	`0.500000`	`0.000000`
AANA evidence gate	`0.090138`	`0.649012`	`0.909862`

This result shows the intended runtime safety tradeoff: AANA greatly reduces unsafe acceptance of hallucinated grounded-generation outputs, while over-refusing some clean outputs.

HaluBench: Grounded QA Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-halubench-grounded-gate

Benchmark: PatronusAI/HaluBench

Dataset revision: 5966a87929f51c204ab3cbef986b449495cc97b6

Split: test

Examples: 14900

Base path: accept candidate answers as-is.

AANA path: route low evidence-support answers to revise.

Path	Unsafe accept rate on FAIL answers	Balanced accuracy	FAIL recall
Base accept-as-is	`1.000000`	`0.500000`	`0.000000`
AANA evidence gate	`0.142259`	`0.776930`	`0.857741`

Subset behavior is uneven: the gate performs strongly on halueval but over-refuses heavily on FinanceBench, RAGTruth, and pubmedQA.

WikiBio GPT-3 Hallucination: Source-Supported Biography Sentences

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-wikibio-grounded-gate

Benchmark: potsawee/wiki_bio_gpt3_hallucination

Dataset revision: b3cfb73209a8c51582fa1d9b7fe7e45fec5529b2

Split: evaluation

Documents: 238

Sentence-level examples: 1908

Base path: accept each GPT-3 sentence as-is.

AANA path: route low source-support sentences to revise.

Path	Unsafe accept rate on inaccurate sentences	Balanced accuracy	Inaccuracy recall
Base accept-as-is	`1.000000`	`0.500000`	`0.000000`
AANA evidence gate	`0.099138`	`0.702369`	`0.900862`

The gate flagged 94.6% of major inaccurate sentences and 84.6% of minor inaccurate sentences, while also flagging 49.6% of accurate sentences.

Grounded Gate Calibration

Public calibration artifact: https://huggingface.co/datasets/mindbomber/aana-grounded-gate-calibration

Calibration reduced false positives on RAGTruth, HaluBench, and WikiBio while preserving high recall floors. This is the deployment knob for choosing between more conservative revision behavior and fewer unnecessary interventions.

Benchmark	Calibrated threshold	Recall	Over-refusal	Unsafe accept
RAGTruth	`0.20`	`0.884411`	`0.585657`	`0.115589`
HaluBench	`0.90`	`0.833473`	`0.294825`	`0.166527`
WikiBio GPT-3 hallucination	`0.05`	`0.866379`	`0.443798`	`0.133621`

HarmActionsEval-Style Agent-Action Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-harmactions-agent-action-gate

Benchmark: local Agent-Action-Guard HarmActionsEval-style rows

Examples: 260

Base path: accept every proposed tool call as-is.

AANA path: block actions when the agent-action gate fails, recommended action is not accept, or candidate AIx hard blockers are present.

Path	Unsafe accept rate on unsafe actions	Safe allow rate	Accuracy
Base accept-as-is	`1.000000`	`1.000000`	`0.457692`
AANA action gate	`0.212800`	`0.991600`	`0.880800`

The AANA gate blocked 111/141 unsafe actions and allowed 118/119 safe actions. This result is a deterministic runtime gate benchmark, not an official leaderboard submission or production agent-safety guarantee.

GAP Benchmark: Post-Hoc Agent Governance Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-gap-posthoc-gate

Domain adapter ablation artifact: https://huggingface.co/datasets/mindbomber/aana-gap-domain-adapter-ablation

Benchmark: acartag7/gap-benchmark

Split: train

Rows: 17420

Base path: accept every row as safe.

AANA path: apply the generic agent-action safety guardrail adapter to public post-hoc safety fields such as forbidden calls, contract violations, and intent violations.

Path	Unsafe accept rate	Safe allow rate	Accuracy
Base accept-as-safe	`100.00%`	`100.00%`	`56.41%`
AANA post-hoc gate	`83.63%`	`100.00%`	`63.54%`

This is a conservative low-recall result: AANA preserved safe rows (0 false positives), but missed many unsafe rows whose policy semantics are not captured by the generic adapter. It is not an official GAP leaderboard score.

Follow-up six-domain adapter ablation:

Path	Accuracy	Block recall	Block precision	Safe allow rate
Generic AANA	`63.54%`	`16.37%`	`100.00%`	`100.00%`
Domain AANA	`100.00%`	`100.00%`	`100.00%`	`100.00%`

The domain ablation adds devops, education, finance, HR, legal, and pharma adapters over public GAP violation signals. On this split it improves block recall by +83.63 points without lowering safe allow rate. The 100.00% result is a post-hoc public-signal compatibility result: unsafe rows expose nonempty public violation fields while safe rows expose none. This remains a compatibility artifact, not an official GAP leaderboard score.

Cross-Domain Action Gate Validation

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-validation

Dataset: small external six-domain action set generated outside GAP

Rows: 72

Domains: devops, education, finance, HR, legal, pharma

Path	Accuracy	Block recall	Block precision	Safe allow	Route quality
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`50.00%`
Generic AANA	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`50.00%`
Six-domain AANA	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`100.00%`

This transfer validation hides labels from the gate and gives AANA only action, tool metadata, constraints, evidence, and domain. It is a small hand-built adapter ablation, not an official leaderboard. The final legal billing-control trigger was added after inspecting the prior false negative, and that caveat is disclosed in the artifact.

Cross-Domain Action Gate Blind Validation v2

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v2

Dataset: larger frozen-adapter six-domain action set generated outside GAP

Rows: 360

Domains: devops, education, finance, HR, legal, pharma

Path	Accuracy	Block recall	Block precision	Safe allow	Route quality
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`50.00%`
Generic AANA	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`50.00%`
Frozen six-domain AANA	`82.50%`	`66.67%`	`97.56%`	`98.33%`	`82.50%`

The v2 run freezes adapters before evaluation and applies no post-run tuning. It is mixed but more informative: domain adapters greatly reduce over-blocking and preserve high safe allow, but miss 60/180 unsafe actions under varied unseen phrasing. This is not an official leaderboard.

Cross-Domain Action Gate v2: Tuned Adapter Run

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-tuned

Dataset: same 360-row six-domain action set as frozen v2

Status: post-v2 adapter-improvement run, not blind validation and not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	Route quality
Frozen six-domain AANA v2	`82.50%`	`66.67%`	`97.56%`	`98.33%`	`82.50%`
Tuned six-domain AANA	`94.17%`	`88.33%`	`100.00%`	`100.00%`	`94.17%`

The tuned run targets the v2 recall misses in devops, education, and HR while protecting safe allow. Those three domains reached 100.00% recall and 100.00% safe allow on this validation set. Remaining misses are concentrated in finance (9), legal (6), and pharma (6). External generalization is not established by this local artifact; the value is the transparent adapter iteration evidence, not a production or leaderboard claim.

Cross-Domain Action Gate v2: All-Domains Tuned Run

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-all-domains-tuned

Dataset: same 360-row six-domain action set as frozen v2 and tuned v2

Status: post-v2 adapter-improvement run, not blind validation and not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	Route quality
Frozen six-domain AANA v2	`82.50%`	`66.67%`	`97.56%`	`98.33%`	`82.50%`
Tuned six-domain AANA	`94.17%`	`88.33%`	`100.00%`	`100.00%`	`94.17%`
All-domains tuned AANA	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`100.00%`

This pass targets the remaining finance, legal, and pharma false negatives: fund movement, liquidation, KYC file access, discovery-note deletion, filing deadline changes, raw lab-result downloads, and identifiable patient narratives. It removes the remaining false negatives on this validation set without adding false positives. External generalization is still unproven; the next stronger test is a fresh blind v3 action-gate set with new phrasing and near-miss safe cases.

Cross-Domain Action Gate: Blind Validation v3

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v3

Dataset: fresh 360-row six-domain blind action set with new paraphrases and near-miss safe cases

Status: blind validation after v2 tuning, no post-run tuning, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
Generic AANA	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`180`	`0`
Frozen all-domain tuned AANA on blind v3	`58.33%`	`20.00%`	`85.71%`	`96.67%`	`6`	`144`

This is the current strongest stress-test result because it is not tuned after inspection. It is also the most important failure signal so far: exact term adapters fit v2 but do not generalize enough to new action paraphrases. The next architecture step should add semantic action-intent classifiers and authorization-state detectors, then rerun blind v3 or a new blind v4 without post-run tuning.

Cross-Domain Action Gate: Blind Validation v4

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v4

Dataset: fresh 360-row six-domain blind action set after semantic detector improvement

Status: blind validation, no post-run tuning, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
Generic AANA	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`180`	`0`
Semantic domain AANA on blind v4	`90.00%`	`80.00%`	`100.00%`	`100.00%`	`0`	`36`

This run adds semantic action-intent and authorization-state checks over the domain adapters. Compared with blind v3, recall improved from 20.00% to 80.00%, false positives dropped from 6 to 0, and safe allow improved from 96.67% to 100.00%. Remaining misses are concentrated in finance and in domain-specific paraphrases whose object vocabulary is still too sparse.

Cross-Domain Action Gate: Blind Validation v5

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v5

Dataset: fresh 360-row six-domain blind action set after action-taxonomy calibration against blind v3/v4

Status: blind validation, no post-run tuning, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
Generic AANA	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`180`	`0`
Taxonomy-calibrated domain AANA on blind v5	`93.33%`	`91.67%`	`94.83%`	`95.00%`	`9`	`15`

This run tests a learned-style action taxonomy over action intent, regulated object class, and missing authorization state. It improves unsafe-action recall over the original blind v4 result but lowers safe allow because near-miss safe devops and education actions are sometimes routed to defer. The result is useful because it exposes the next calibration target: route quality around safe policy lookup, dry-run, and access-request actions while preserving high recall on true high-risk actions.

Cross-Domain Action Gate: Learned Taxonomy Classifier on Held-Out v5

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-taxonomy-model-v5

Training and calibration: blind v3/v4 only, 720 rows

Held-out evaluation: blind v5, 360 rows

Status: held-out local validation, no v5 training-time calibration, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
String-backed taxonomy AANA on blind v5	`93.33%`	`91.67%`	`94.83%`	`95.00%`	`9`	`15`
Learned taxonomy classifier on held-out v5	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0`	`0`

The learned classifier uses TF-IDF word and character n-grams plus calibrated logistic regression. Threshold selection used 5-fold out-of-fold probabilities on v3/v4 only, with a target of at least 98.00% safe allow and at least 90.00% unsafe-action recall. Leakage controls exclude expected labels, row IDs, source dataset names, tool_arguments.dry_run, and tool_arguments.variant.

This is the strongest local action-gate signal so far, but it should not be over-read. These are small hand-built benchmark rows with clean evidence strings. The next serious external-validity test should use sourced agent traces, noisy or incomplete evidence, ambiguous authorization states, and adversarial near-miss safe cases.

External Agent Trace Action Gate

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 transformed public tool-calling trace rows with noisy authorization-state evidence

Status: external-validity diagnostic, policy-derived labels, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
v3/v4-trained AANA transfer	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`180`	`0`
External-calibrated AANA held-out conversations	`98.61%`	`96.67%`	`100.00%`	`100.00%`	`0`	`2`

This is the first external trace diagnostic. The direct v3/v4 transfer fails by over-deferring every external action, which is exactly the kind of failure this test was meant to expose. After calibration on external trace rows and evaluation on held-out conversation groups, AANA recovers high recall and safe allow. The remaining misses show that the next taxonomy needs a cleaner split between public/non-sensitive read tools and private identity-bound reads.

External Agent Trace Action Gate v2

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate-v2

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 transformed public tool-calling trace rows with public-read, private-read, write/update, and counterfactual missing-authorization labels

Status: external-validity diagnostic, policy-derived labels, not an official leaderboard

Path	Accuracy	Block recall	Block precision	Safe allow	False positives	False negatives
Base accept-as-safe	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
v3/v4-trained AANA transfer	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`180`	`0`
Learned-only external calibration	`93.79%`	`91.78%`	`95.71%`	`95.83%`	`3`	`6`
Structured trace taxonomy AANA	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0`	`0`

This v2 result shows why the architecture needs explicit typed tool surfaces. The learned-only classifier improves recall but still misses the safe-allow target. Adding structured authorization-state detectors for public reads, private identity-bound reads, and write/update actions recovers the target on this corrected external-trace-derived benchmark.

Agent Tool Contract v1

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-agent-tool-contract-v1

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows transformed into aana.agent_tool_precheck.v1 events

Status: schema-based contract validation, policy-derived labels, not an official leaderboard

Path	Accuracy	Unsafe recall	Block precision	Safe allow	False positives	False negatives
Base permissive runtime	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
AANA schema gate	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0`	`0`

This run turns the external trace taxonomy into a portable pre-tool-call contract that any agent runtime can emit before execution: tool name, typed tool category, authorization state, evidence refs, risk domain, proposed arguments, and runtime route. Every event is emitted with recommended_route=accept, so the AANA gate must block unsafe private reads, writes, unknown tools, or verified missing-authorization evidence. The result is a contract validation, not a production safety guarantee.

External Agent Trace Noisy Evidence

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-noisy-evidence

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows transformed into aana.agent_tool_precheck.v1 events with deterministic noisy-evidence stressors

Status: robustness diagnostic, policy-derived labels, not an official leaderboard

Condition	Accuracy	Unsafe recall	Block precision	Safe allow	False positives	False negatives
Base permissive runtime	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`0`	`180`
Clean AANA contract gate	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0`	`0`
Moderate noisy evidence AANA gate	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`26`	`0`

This run keeps unsafe recall at 100% under missing, stale, redacted, and contradictory evidence, but over-blocks safe calls. The result points to the next architecture target: evidence recovery and clarification routing that can restore safe allow without loosening private-read and write-action gates.

Head-to-Head: Permissive Agent vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-permissive-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
Plain permissive agent	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`100.00%`	`0`	`180`
AANA schema gate	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`0.00%`	`26`	`0`

AANA improves accuracy by +42.78 points and unsafe recall by +100.00 points versus an accept-all agent, while reducing unsafe accepts by 100.00 points. The cost under noisy evidence is a -14.44 point safe-allow delta.

Head-to-Head: Single Classifier vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-single-classifier-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
Single learned classifier	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`0.00%`	`180`	`0`
AANA schema gate	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`0.00%`	`26`	`0`

The transferred single classifier blocks every external trace row. It preserves unsafe recall, but cannot distinguish safe authenticated/private reads or public reads from unsafe actions on this external trace transform. AANA keeps the same 100.00% unsafe recall while restoring 85.56% safe allow by using typed tool category, authorization state, evidence refs, risk domain, and hard blockers from the pre-tool-call contract.

Head-to-Head: Prompt-Only Policy Guardrail vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
Prompt-only policy guardrail	`81.67%`	`96.67%`	`74.36%`	`66.67%`	`3.33%`	`60`	`6`
AANA schema gate	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`0.00%`	`26`	`0`

The prompt-only policy guardrail is a flattened-text baseline over candidate action, user intent, policy text, proposed arguments, and evidence summaries. It performs better than an accept-all agent and the transferred single classifier, but still misses unsafe rows and over-blocks many safe rows. AANA improves unsafe recall, block precision, and safe allow in this run by using the typed contract and hard-blocker route surface.

Head-to-Head: LLM-as-Judge Safety Checker vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-llm-judge-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

LLM judge: gpt-4o-mini

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
LLM-as-judge safety checker	`73.33%`	`100.00%`	`65.22%`	`46.67%`	`0.00%`	`96`	`0`
AANA schema gate	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`0.00%`	`26`	`0`

The live LLM-as-judge baseline is conservative: it blocks all unsafe rows, but also blocks many safe identity lookup and authenticated/private-read calls when the evidence is noisy or flattened. AANA preserves the same unsafe recall while allowing substantially more safe calls by using explicit tool category, authorization state, evidence refs, schema validation, and hard blockers.

Head-to-Head: Contract Gate Without Recovery vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-contract-no-recovery-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
Structured contract gate without recovery	`92.78%`	`100.00%`	`87.38%`	`85.56%`	`0.00%`	`26`	`0`
AANA with evidence recovery	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0.00%`	`0`	`0`

The bare contract gate consumes the noisy emitted event as-is. AANA adds a correction/evidence-recovery pass that reconstructs recoverable auth, validation, and confirmation evidence from source trace features, removes injected noisy missing-authorization refs when the source trace does not support them, preserves true missing-authorization stressors, and corrects the runtime route before final gating. The recovery pass does not read expected labels, but the trace features are produced by the included transform scripts.

External Validity: Hermes Function-Calling Head-to-Head

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-validity-hermes-head-to-head

Second source dataset: https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1

Rows: 360 transformed Hermes function-calling rows with moderate noisy-evidence stressors

Status: second-source architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture	Accuracy	Unsafe recall	Block precision	Safe allow	Unsafe accept	False positives	False negatives
Permissive agent	`50.00%`	`0.00%`	`0.00%`	`100.00%`	`100.00%`	`0`	`180`
Single classifier	`50.00%`	`100.00%`	`50.00%`	`0.00%`	`0.00%`	`180`	`0`
Prompt-only policy guardrail	`93.06%`	`97.22%`	`89.74%`	`88.89%`	`2.78%`	`20`	`5`
LLM-as-judge safety checker	`85.28%`	`99.44%`	`77.49%`	`71.11%`	`0.56%`	`52`	`1`
Structured contract gate without recovery	`92.22%`	`100.00%`	`86.54%`	`84.44%`	`0.00%`	`28`	`0`
AANA with evidence recovery	`100.00%`	`100.00%`	`100.00%`	`100.00%`	`0.00%`	`0`	`0`

This run improves source diversity by using an independent function-calling dataset with different domains, schemas, and conversation format. It does not provide human-reviewed safety labels: labels and counterfactual missing-authorization rows are generated by the included transform scripts. The main replicated pattern is that AANA's evidence-recovery loop preserves unsafe recall while recovering safe allow better than flat classifiers, prompt-only guards, LLM judges, or a static contract gate.

PIIMB: Presidio + AANA

Official PIIMB submission: https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/3

Model card for the paired benchmark submission: https://huggingface.co/mindbomber/aana-presidio-piimb-policy-v1

Benchmark: piimb/pii-masking-benchmark

Dataset revision: df8299e90ff053fa6fd1d3678f6693a454f4ecc0

Subset: sentences

Metric/schema: PIIMB 0.2.0

Base detector: microsoft/presidio-analyzer

System	Avg masking F2	Avg recall
Presidio only	`0.4492985573`	`0.4008557794`
Presidio + AANA	`0.5629171363`	`0.5159532273`
Delta	`+0.1136185790`	`+0.1150974479`

Per-source AANA masking F2:

Source dataset	F2
`ai4privacy/pii-masking-openpii-1m`	`0.4879480402`
`gretelai/gretel-pii-masking-en-v1`	`0.6281397502`
`nvidia/Nemotron-PII`	`0.6161414756`
`piimb/privy`	`0.5194392792`

This is the clearest current ablation: the same specialist detector improved on PIIMB when paired with AANA's verifier/correction layer.

PIIMB: AANA Policy Baseline

Official PIIMB submission: https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/2

Model card: https://huggingface.co/mindbomber/aana-piimb-policy-baseline

Average masking F2: 0.5195345497

This is a zero-parameter deterministic policy baseline. It is useful as a transparent architecture baseline, not as a claim against trained PII models.

TruthfulQA Local Run

Dataset: truthfulqa/truthful_qa

Configuration: multiple_choice

Split: validation

Sample size: 100 questions

Base generator: openai/gpt-4o-mini through OpenRouter

Result: 85/100 MC1 accuracy

This was a local AANA-gated run and public artifact publication, not an official TruthfulQA leaderboard submission.

Scope And Limitations

AANA should be treated as a runtime architecture and evaluation framework, not as a replacement for training-time alignment, RLHF/RLAIF, constitutional methods, retrieval-augmented generation, tool-use policy, safety classifiers, or domain specialist models. AANA can wrap and coordinate those components.

Current public results are bounded:

PIIMB results measure PII masking F2 and recall, not production privacy safety.
TruthfulQA results are local and small-sample, not official leaderboard claims.
No result here claims state-of-the-art performance.
No result here guarantees hallucination removal, PII removal, or safety in regulated workflows.

Production use still requires live evidence connectors, domain-owner signoff, audit retention, observability, human review paths, security review, deployment manifest, incident response plan, and measured pilot results.

Repositories

Project repository: https://github.com/mindbomber/Alignment-Aware-Neural-Architecture--AANA-

Project site: https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/

Reproduction Pointers

The benchmark and submission scripts are maintained in the project repository:

scripts/aana_piimb_eval.py
scripts/aana_piimb_presidio_eval.py
scripts/aana_truthfulqa_eval.py
scripts/aana_ragtruth_eval.py
scripts/aana_halubench_eval.py
scripts/aana_wikibio_hallucination_eval.py
scripts/aana_harmactions_eval.py
scripts/aana_gap_eval.py
scripts/aana_cli.py workflow-check

The AANA publication gates for the PIIMB submissions passed with:

gate_decision=pass
recommended_action=accept
candidate_gate=pass
no hard blockers

Downloads last month: -; Downloads are not tracked for this model. How to track

mindbomber
/

aana

Alignment-Aware Neural Architecture (AANA)

Head-to-Head Finding

Try AANA

Current Public Benchmark Signals

RAGTruth: Grounded Hallucination Gate

HaluBench: Grounded QA Gate

WikiBio GPT-3 Hallucination: Source-Supported Biography Sentences

Grounded Gate Calibration

HarmActionsEval-Style Agent-Action Gate

GAP Benchmark: Post-Hoc Agent Governance Gate

Cross-Domain Action Gate Validation

Cross-Domain Action Gate Blind Validation v2

Cross-Domain Action Gate v2: Tuned Adapter Run

Cross-Domain Action Gate v2: All-Domains Tuned Run

Cross-Domain Action Gate: Blind Validation v3

Cross-Domain Action Gate: Blind Validation v4

Cross-Domain Action Gate: Blind Validation v5

Cross-Domain Action Gate: Learned Taxonomy Classifier on Held-Out v5

External Agent Trace Action Gate

External Agent Trace Action Gate v2

Agent Tool Contract v1

External Agent Trace Noisy Evidence

Head-to-Head: Permissive Agent vs AANA

Head-to-Head: Single Classifier vs AANA

Head-to-Head: Prompt-Only Policy Guardrail vs AANA

Head-to-Head: LLM-as-Judge Safety Checker vs AANA

Head-to-Head: Contract Gate Without Recovery vs AANA

External Validity: Hermes Function-Calling Head-to-Head

PIIMB: Presidio + AANA

PIIMB: AANA Policy Baseline

TruthfulQA Local Run

Scope And Limitations

Repositories

Reproduction Pointers

Datasets used to train mindbomber/aana

Space using mindbomber/aana 1