Alignment-Aware Neural Architecture (AANA)

AANA is a verifier-grounded runtime architecture for making AI and agent outputs more correctable before they are published, sent, deployed, or used for consequential actions.

It is not a standalone set of neural weights. AANA wraps a base generator or specialist detector with explicit verifier, grounding, correction, and gate components:

S = (f_theta, E_phi, R, Pi_psi, G)
  • f_theta: base generator, LLM, agent, tool planner, or specialist detector.
  • E_phi: verifier stack for factual, safety, policy, privacy, and task constraints.
  • R: retrieval or grounding module for evidence.
  • Pi_psi: correction policy that can accept, revise, retrieve, ask, refuse, or defer.
  • G: alignment gate that blocks unsupported final outputs or unsafe actions.

The goal is not to claim perfect alignment. The goal is to make deployment-time correctability, evidence, gating, and auditability explicit.

Head-to-Head Finding

Across two public agent/tool-call sources, the strongest repeated signal is:

AANA improves agent action reliability by combining structured pre-tool-call contracts, verifier gates, and evidence-recovery loops. In these diagnostics, AANA preserves unsafe-action recall while recovering more safe actions than permissive agents, single classifiers, prompt-only guards, LLM judges, or static contract gates.

Summary:

Source Architecture Accuracy Unsafe recall Safe allow FP FN
Qwen traces Permissive agent 50.00% 0.00% 100.00% 0 180
Qwen traces Single classifier 50.00% 100.00% 0.00% 180 0
Qwen traces Prompt-only guardrail 81.67% 96.67% 66.67% 60 6
Qwen traces LLM-as-judge 73.33% 100.00% 46.67% 96 0
Qwen traces Contract gate, no recovery 92.78% 100.00% 85.56% 26 0
Qwen traces AANA with recovery 100.00% 100.00% 100.00% 0 0
Hermes traces Permissive agent 50.00% 0.00% 100.00% 0 180
Hermes traces Single classifier 50.00% 100.00% 0.00% 180 0
Hermes traces Prompt-only guardrail 93.06% 97.22% 88.89% 20 5
Hermes traces LLM-as-judge 85.28% 99.44% 71.11% 52 1
Hermes traces Contract gate, no recovery 92.22% 100.00% 84.44% 28 0
Hermes traces AANA with recovery 100.00% 100.00% 100.00% 0 0

Evidence tiers matter. PIIMB is an official external benchmark submission. The Qwen and Hermes head-to-heads use public datasets with reproducible transforms and policy-derived labels, not human-reviewed safety labels. Local blind action-gate runs are useful development ablations but weaker external validity evidence.

Public summary: https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/aana-head-to-head-findings.md

Try AANA

Use the public Hugging Face Space as the quickest way to try the AANA gate with your own candidate answer/action, evidence, and constraints:

https://huggingface.co/spaces/mindbomber/aana-demo

The demo returns an AANA-style route (accept, revise, ask, defer, or refuse), AIx score, hard blockers, suggested revision/route, and audit summary.

Current Public Benchmark Signals

RAGTruth: Grounded Hallucination Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-ragtruth-grounded-gate

Benchmark: wandb/RAGTruth-processed

Dataset revision: eb4f4b9d1b68eb7092d3e1a61c0cd82d9808737b

Split: test

Examples: 2700

Base path: accept existing model outputs as-is.

AANA path: route low evidence-support outputs to revise.

Path Unsafe accept rate on hallucinated outputs Balanced accuracy Hallucination recall
Base accept-as-is 1.000000 0.500000 0.000000
AANA evidence gate 0.090138 0.649012 0.909862

This result shows the intended runtime safety tradeoff: AANA greatly reduces unsafe acceptance of hallucinated grounded-generation outputs, while over-refusing some clean outputs.

HaluBench: Grounded QA Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-halubench-grounded-gate

Benchmark: PatronusAI/HaluBench

Dataset revision: 5966a87929f51c204ab3cbef986b449495cc97b6

Split: test

Examples: 14900

Base path: accept candidate answers as-is.

AANA path: route low evidence-support answers to revise.

Path Unsafe accept rate on FAIL answers Balanced accuracy FAIL recall
Base accept-as-is 1.000000 0.500000 0.000000
AANA evidence gate 0.142259 0.776930 0.857741

Subset behavior is uneven: the gate performs strongly on halueval but over-refuses heavily on FinanceBench, RAGTruth, and pubmedQA.

WikiBio GPT-3 Hallucination: Source-Supported Biography Sentences

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-wikibio-grounded-gate

Benchmark: potsawee/wiki_bio_gpt3_hallucination

Dataset revision: b3cfb73209a8c51582fa1d9b7fe7e45fec5529b2

Split: evaluation

Documents: 238

Sentence-level examples: 1908

Base path: accept each GPT-3 sentence as-is.

AANA path: route low source-support sentences to revise.

Path Unsafe accept rate on inaccurate sentences Balanced accuracy Inaccuracy recall
Base accept-as-is 1.000000 0.500000 0.000000
AANA evidence gate 0.099138 0.702369 0.900862

The gate flagged 94.6% of major inaccurate sentences and 84.6% of minor inaccurate sentences, while also flagging 49.6% of accurate sentences.

Grounded Gate Calibration

Public calibration artifact: https://huggingface.co/datasets/mindbomber/aana-grounded-gate-calibration

Calibration reduced false positives on RAGTruth, HaluBench, and WikiBio while preserving high recall floors. This is the deployment knob for choosing between more conservative revision behavior and fewer unnecessary interventions.

Benchmark Calibrated threshold Recall Over-refusal Unsafe accept
RAGTruth 0.20 0.884411 0.585657 0.115589
HaluBench 0.90 0.833473 0.294825 0.166527
WikiBio GPT-3 hallucination 0.05 0.866379 0.443798 0.133621

HarmActionsEval-Style Agent-Action Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-harmactions-agent-action-gate

Benchmark: local Agent-Action-Guard HarmActionsEval-style rows

Examples: 260

Base path: accept every proposed tool call as-is.

AANA path: block actions when the agent-action gate fails, recommended action is not accept, or candidate AIx hard blockers are present.

Path Unsafe accept rate on unsafe actions Safe allow rate Accuracy
Base accept-as-is 1.000000 1.000000 0.457692
AANA action gate 0.212800 0.991600 0.880800

The AANA gate blocked 111/141 unsafe actions and allowed 118/119 safe actions. This result is a deterministic runtime gate benchmark, not an official leaderboard submission or production agent-safety guarantee.

GAP Benchmark: Post-Hoc Agent Governance Gate

Public result artifact: https://huggingface.co/datasets/mindbomber/aana-gap-posthoc-gate

Domain adapter ablation artifact: https://huggingface.co/datasets/mindbomber/aana-gap-domain-adapter-ablation

Benchmark: acartag7/gap-benchmark

Split: train

Rows: 17420

Base path: accept every row as safe.

AANA path: apply the generic agent-action safety guardrail adapter to public post-hoc safety fields such as forbidden calls, contract violations, and intent violations.

Path Unsafe accept rate Safe allow rate Accuracy
Base accept-as-safe 100.00% 100.00% 56.41%
AANA post-hoc gate 83.63% 100.00% 63.54%

This is a conservative low-recall result: AANA preserved safe rows (0 false positives), but missed many unsafe rows whose policy semantics are not captured by the generic adapter. It is not an official GAP leaderboard score.

Follow-up six-domain adapter ablation:

Path Accuracy Block recall Block precision Safe allow rate
Generic AANA 63.54% 16.37% 100.00% 100.00%
Domain AANA 100.00% 100.00% 100.00% 100.00%

The domain ablation adds devops, education, finance, HR, legal, and pharma adapters over public GAP violation signals. On this split it improves block recall by +83.63 points without lowering safe allow rate. The 100.00% result is a post-hoc public-signal compatibility result: unsafe rows expose nonempty public violation fields while safe rows expose none. This remains a compatibility artifact, not an official GAP leaderboard score.

Cross-Domain Action Gate Validation

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-validation

Dataset: small external six-domain action set generated outside GAP

Rows: 72

Domains: devops, education, finance, HR, legal, pharma

Path Accuracy Block recall Block precision Safe allow Route quality
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 50.00%
Generic AANA 50.00% 100.00% 50.00% 0.00% 50.00%
Six-domain AANA 100.00% 100.00% 100.00% 100.00% 100.00%

This transfer validation hides labels from the gate and gives AANA only action, tool metadata, constraints, evidence, and domain. It is a small hand-built adapter ablation, not an official leaderboard. The final legal billing-control trigger was added after inspecting the prior false negative, and that caveat is disclosed in the artifact.

Cross-Domain Action Gate Blind Validation v2

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v2

Dataset: larger frozen-adapter six-domain action set generated outside GAP

Rows: 360

Domains: devops, education, finance, HR, legal, pharma

Path Accuracy Block recall Block precision Safe allow Route quality
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 50.00%
Generic AANA 50.00% 100.00% 50.00% 0.00% 50.00%
Frozen six-domain AANA 82.50% 66.67% 97.56% 98.33% 82.50%

The v2 run freezes adapters before evaluation and applies no post-run tuning. It is mixed but more informative: domain adapters greatly reduce over-blocking and preserve high safe allow, but miss 60/180 unsafe actions under varied unseen phrasing. This is not an official leaderboard.

Cross-Domain Action Gate v2: Tuned Adapter Run

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-tuned

Dataset: same 360-row six-domain action set as frozen v2

Status: post-v2 adapter-improvement run, not blind validation and not an official leaderboard

Path Accuracy Block recall Block precision Safe allow Route quality
Frozen six-domain AANA v2 82.50% 66.67% 97.56% 98.33% 82.50%
Tuned six-domain AANA 94.17% 88.33% 100.00% 100.00% 94.17%

The tuned run targets the v2 recall misses in devops, education, and HR while protecting safe allow. Those three domains reached 100.00% recall and 100.00% safe allow on this validation set. Remaining misses are concentrated in finance (9), legal (6), and pharma (6). External generalization is not established by this local artifact; the value is the transparent adapter iteration evidence, not a production or leaderboard claim.

Cross-Domain Action Gate v2: All-Domains Tuned Run

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-all-domains-tuned

Dataset: same 360-row six-domain action set as frozen v2 and tuned v2

Status: post-v2 adapter-improvement run, not blind validation and not an official leaderboard

Path Accuracy Block recall Block precision Safe allow Route quality
Frozen six-domain AANA v2 82.50% 66.67% 97.56% 98.33% 82.50%
Tuned six-domain AANA 94.17% 88.33% 100.00% 100.00% 94.17%
All-domains tuned AANA 100.00% 100.00% 100.00% 100.00% 100.00%

This pass targets the remaining finance, legal, and pharma false negatives: fund movement, liquidation, KYC file access, discovery-note deletion, filing deadline changes, raw lab-result downloads, and identifiable patient narratives. It removes the remaining false negatives on this validation set without adding false positives. External generalization is still unproven; the next stronger test is a fresh blind v3 action-gate set with new phrasing and near-miss safe cases.

Cross-Domain Action Gate: Blind Validation v3

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v3

Dataset: fresh 360-row six-domain blind action set with new paraphrases and near-miss safe cases

Status: blind validation after v2 tuning, no post-run tuning, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 0 180
Generic AANA 50.00% 100.00% 50.00% 0.00% 180 0
Frozen all-domain tuned AANA on blind v3 58.33% 20.00% 85.71% 96.67% 6 144

This is the current strongest stress-test result because it is not tuned after inspection. It is also the most important failure signal so far: exact term adapters fit v2 but do not generalize enough to new action paraphrases. The next architecture step should add semantic action-intent classifiers and authorization-state detectors, then rerun blind v3 or a new blind v4 without post-run tuning.

Cross-Domain Action Gate: Blind Validation v4

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v4

Dataset: fresh 360-row six-domain blind action set after semantic detector improvement

Status: blind validation, no post-run tuning, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 0 180
Generic AANA 50.00% 100.00% 50.00% 0.00% 180 0
Semantic domain AANA on blind v4 90.00% 80.00% 100.00% 100.00% 0 36

This run adds semantic action-intent and authorization-state checks over the domain adapters. Compared with blind v3, recall improved from 20.00% to 80.00%, false positives dropped from 6 to 0, and safe allow improved from 96.67% to 100.00%. Remaining misses are concentrated in finance and in domain-specific paraphrases whose object vocabulary is still too sparse.

Cross-Domain Action Gate: Blind Validation v5

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v5

Dataset: fresh 360-row six-domain blind action set after action-taxonomy calibration against blind v3/v4

Status: blind validation, no post-run tuning, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 0 180
Generic AANA 50.00% 100.00% 50.00% 0.00% 180 0
Taxonomy-calibrated domain AANA on blind v5 93.33% 91.67% 94.83% 95.00% 9 15

This run tests a learned-style action taxonomy over action intent, regulated object class, and missing authorization state. It improves unsafe-action recall over the original blind v4 result but lowers safe allow because near-miss safe devops and education actions are sometimes routed to defer. The result is useful because it exposes the next calibration target: route quality around safe policy lookup, dry-run, and access-request actions while preserving high recall on true high-risk actions.

Cross-Domain Action Gate: Learned Taxonomy Classifier on Held-Out v5

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-taxonomy-model-v5

Training and calibration: blind v3/v4 only, 720 rows

Held-out evaluation: blind v5, 360 rows

Status: held-out local validation, no v5 training-time calibration, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
String-backed taxonomy AANA on blind v5 93.33% 91.67% 94.83% 95.00% 9 15
Learned taxonomy classifier on held-out v5 100.00% 100.00% 100.00% 100.00% 0 0

The learned classifier uses TF-IDF word and character n-grams plus calibrated logistic regression. Threshold selection used 5-fold out-of-fold probabilities on v3/v4 only, with a target of at least 98.00% safe allow and at least 90.00% unsafe-action recall. Leakage controls exclude expected labels, row IDs, source dataset names, tool_arguments.dry_run, and tool_arguments.variant.

This is the strongest local action-gate signal so far, but it should not be over-read. These are small hand-built benchmark rows with clean evidence strings. The next serious external-validity test should use sourced agent traces, noisy or incomplete evidence, ambiguous authorization states, and adversarial near-miss safe cases.

External Agent Trace Action Gate

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 transformed public tool-calling trace rows with noisy authorization-state evidence

Status: external-validity diagnostic, policy-derived labels, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 0 180
v3/v4-trained AANA transfer 50.00% 100.00% 50.00% 0.00% 180 0
External-calibrated AANA held-out conversations 98.61% 96.67% 100.00% 100.00% 0 2

This is the first external trace diagnostic. The direct v3/v4 transfer fails by over-deferring every external action, which is exactly the kind of failure this test was meant to expose. After calibration on external trace rows and evaluation on held-out conversation groups, AANA recovers high recall and safe allow. The remaining misses show that the next taxonomy needs a cleaner split between public/non-sensitive read tools and private identity-bound reads.

External Agent Trace Action Gate v2

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate-v2

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 transformed public tool-calling trace rows with public-read, private-read, write/update, and counterfactual missing-authorization labels

Status: external-validity diagnostic, policy-derived labels, not an official leaderboard

Path Accuracy Block recall Block precision Safe allow False positives False negatives
Base accept-as-safe 50.00% 0.00% 0.00% 100.00% 0 180
v3/v4-trained AANA transfer 50.00% 100.00% 50.00% 0.00% 180 0
Learned-only external calibration 93.79% 91.78% 95.71% 95.83% 3 6
Structured trace taxonomy AANA 100.00% 100.00% 100.00% 100.00% 0 0

This v2 result shows why the architecture needs explicit typed tool surfaces. The learned-only classifier improves recall but still misses the safe-allow target. Adding structured authorization-state detectors for public reads, private identity-bound reads, and write/update actions recovers the target on this corrected external-trace-derived benchmark.

Agent Tool Contract v1

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-agent-tool-contract-v1

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows transformed into aana.agent_tool_precheck.v1 events

Status: schema-based contract validation, policy-derived labels, not an official leaderboard

Path Accuracy Unsafe recall Block precision Safe allow False positives False negatives
Base permissive runtime 50.00% 0.00% 0.00% 100.00% 0 180
AANA schema gate 100.00% 100.00% 100.00% 100.00% 0 0

This run turns the external trace taxonomy into a portable pre-tool-call contract that any agent runtime can emit before execution: tool name, typed tool category, authorization state, evidence refs, risk domain, proposed arguments, and runtime route. Every event is emitted with recommended_route=accept, so the AANA gate must block unsafe private reads, writes, unknown tools, or verified missing-authorization evidence. The result is a contract validation, not a production safety guarantee.

External Agent Trace Noisy Evidence

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-noisy-evidence

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows transformed into aana.agent_tool_precheck.v1 events with deterministic noisy-evidence stressors

Status: robustness diagnostic, policy-derived labels, not an official leaderboard

Condition Accuracy Unsafe recall Block precision Safe allow False positives False negatives
Base permissive runtime 50.00% 0.00% 0.00% 100.00% 0 180
Clean AANA contract gate 100.00% 100.00% 100.00% 100.00% 0 0
Moderate noisy evidence AANA gate 92.78% 100.00% 87.38% 85.56% 26 0

This run keeps unsafe recall at 100% under missing, stale, redacted, and contradictory evidence, but over-blocks safe calls. The result points to the next architecture target: evidence recovery and clarification routing that can restore safe allow without loosening private-read and write-action gates.

Head-to-Head: Permissive Agent vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-permissive-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
Plain permissive agent 50.00% 0.00% 0.00% 100.00% 100.00% 0 180
AANA schema gate 92.78% 100.00% 87.38% 85.56% 0.00% 26 0

AANA improves accuracy by +42.78 points and unsafe recall by +100.00 points versus an accept-all agent, while reducing unsafe accepts by 100.00 points. The cost under noisy evidence is a -14.44 point safe-allow delta.

Head-to-Head: Single Classifier vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-single-classifier-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
Single learned classifier 50.00% 100.00% 50.00% 0.00% 0.00% 180 0
AANA schema gate 92.78% 100.00% 87.38% 85.56% 0.00% 26 0

The transferred single classifier blocks every external trace row. It preserves unsafe recall, but cannot distinguish safe authenticated/private reads or public reads from unsafe actions on this external trace transform. AANA keeps the same 100.00% unsafe recall while restoring 85.56% safe allow by using typed tool category, authorization state, evidence refs, risk domain, and hard blockers from the pre-tool-call contract.

Head-to-Head: Prompt-Only Policy Guardrail vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
Prompt-only policy guardrail 81.67% 96.67% 74.36% 66.67% 3.33% 60 6
AANA schema gate 92.78% 100.00% 87.38% 85.56% 0.00% 26 0

The prompt-only policy guardrail is a flattened-text baseline over candidate action, user intent, policy text, proposed arguments, and evidence summaries. It performs better than an accept-all agent and the transferred single classifier, but still misses unsafe rows and over-blocks many safe rows. AANA improves unsafe recall, block precision, and safe allow in this run by using the typed contract and hard-blocker route surface.

Head-to-Head: LLM-as-Judge Safety Checker vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-llm-judge-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

LLM judge: gpt-4o-mini

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
LLM-as-judge safety checker 73.33% 100.00% 65.22% 46.67% 0.00% 96 0
AANA schema gate 92.78% 100.00% 87.38% 85.56% 0.00% 26 0

The live LLM-as-judge baseline is conservative: it blocks all unsafe rows, but also blocks many safe identity lookup and authenticated/private-read calls when the evidence is noisy or flattened. AANA preserves the same unsafe recall while allowing substantially more safe calls by using explicit tool category, authorization state, evidence refs, schema validation, and hard blockers.

Head-to-Head: Contract Gate Without Recovery vs AANA

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-head-to-head-contract-no-recovery-vs-aana

Source dataset: https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory

Rows: 360 external trace rows with moderate noisy-evidence stressors

Status: head-to-head architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
Structured contract gate without recovery 92.78% 100.00% 87.38% 85.56% 0.00% 26 0
AANA with evidence recovery 100.00% 100.00% 100.00% 100.00% 0.00% 0 0

The bare contract gate consumes the noisy emitted event as-is. AANA adds a correction/evidence-recovery pass that reconstructs recoverable auth, validation, and confirmation evidence from source trace features, removes injected noisy missing-authorization refs when the source trace does not support them, preserves true missing-authorization stressors, and corrects the runtime route before final gating. The recovery pass does not read expected labels, but the trace features are produced by the included transform scripts.

External Validity: Hermes Function-Calling Head-to-Head

Public validation artifact: https://huggingface.co/datasets/mindbomber/aana-external-validity-hermes-head-to-head

Second source dataset: https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1

Rows: 360 transformed Hermes function-calling rows with moderate noisy-evidence stressors

Status: second-source architecture diagnostic, policy-derived labels, not an official leaderboard

Architecture Accuracy Unsafe recall Block precision Safe allow Unsafe accept False positives False negatives
Permissive agent 50.00% 0.00% 0.00% 100.00% 100.00% 0 180
Single classifier 50.00% 100.00% 50.00% 0.00% 0.00% 180 0
Prompt-only policy guardrail 93.06% 97.22% 89.74% 88.89% 2.78% 20 5
LLM-as-judge safety checker 85.28% 99.44% 77.49% 71.11% 0.56% 52 1
Structured contract gate without recovery 92.22% 100.00% 86.54% 84.44% 0.00% 28 0
AANA with evidence recovery 100.00% 100.00% 100.00% 100.00% 0.00% 0 0

This run improves source diversity by using an independent function-calling dataset with different domains, schemas, and conversation format. It does not provide human-reviewed safety labels: labels and counterfactual missing-authorization rows are generated by the included transform scripts. The main replicated pattern is that AANA's evidence-recovery loop preserves unsafe recall while recovering safe allow better than flat classifiers, prompt-only guards, LLM judges, or a static contract gate.

PIIMB: Presidio + AANA

Official PIIMB submission: https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/3

Model card for the paired benchmark submission: https://huggingface.co/mindbomber/aana-presidio-piimb-policy-v1

Benchmark: piimb/pii-masking-benchmark

Dataset revision: df8299e90ff053fa6fd1d3678f6693a454f4ecc0

Subset: sentences

Metric/schema: PIIMB 0.2.0

Base detector: microsoft/presidio-analyzer

System Avg masking F2 Avg recall
Presidio only 0.4492985573 0.4008557794
Presidio + AANA 0.5629171363 0.5159532273
Delta +0.1136185790 +0.1150974479

Per-source AANA masking F2:

Source dataset F2
ai4privacy/pii-masking-openpii-1m 0.4879480402
gretelai/gretel-pii-masking-en-v1 0.6281397502
nvidia/Nemotron-PII 0.6161414756
piimb/privy 0.5194392792

This is the clearest current ablation: the same specialist detector improved on PIIMB when paired with AANA's verifier/correction layer.

PIIMB: AANA Policy Baseline

Official PIIMB submission: https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/2

Model card: https://huggingface.co/mindbomber/aana-piimb-policy-baseline

Average masking F2: 0.5195345497

This is a zero-parameter deterministic policy baseline. It is useful as a transparent architecture baseline, not as a claim against trained PII models.

TruthfulQA Local Run

Dataset: truthfulqa/truthful_qa

Configuration: multiple_choice

Split: validation

Sample size: 100 questions

Base generator: openai/gpt-4o-mini through OpenRouter

Result: 85/100 MC1 accuracy

This was a local AANA-gated run and public artifact publication, not an official TruthfulQA leaderboard submission.

Scope And Limitations

AANA should be treated as a runtime architecture and evaluation framework, not as a replacement for training-time alignment, RLHF/RLAIF, constitutional methods, retrieval-augmented generation, tool-use policy, safety classifiers, or domain specialist models. AANA can wrap and coordinate those components.

Current public results are bounded:

  • PIIMB results measure PII masking F2 and recall, not production privacy safety.
  • TruthfulQA results are local and small-sample, not official leaderboard claims.
  • No result here claims state-of-the-art performance.
  • No result here guarantees hallucination removal, PII removal, or safety in regulated workflows.

Production use still requires live evidence connectors, domain-owner signoff, audit retention, observability, human review paths, security review, deployment manifest, incident response plan, and measured pilot results.

Repositories

Project repository: https://github.com/mindbomber/Alignment-Aware-Neural-Architecture--AANA-

Project site: https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/

Reproduction Pointers

The benchmark and submission scripts are maintained in the project repository:

  • scripts/aana_piimb_eval.py
  • scripts/aana_piimb_presidio_eval.py
  • scripts/aana_truthfulqa_eval.py
  • scripts/aana_ragtruth_eval.py
  • scripts/aana_halubench_eval.py
  • scripts/aana_wikibio_hallucination_eval.py
  • scripts/aana_harmactions_eval.py
  • scripts/aana_gap_eval.py
  • scripts/aana_cli.py workflow-check

The AANA publication gates for the PIIMB submissions passed with:

  • gate_decision=pass
  • recommended_action=accept
  • candidate_gate=pass
  • no hard blockers
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train mindbomber/aana

Space using mindbomber/aana 1