pyrrho-nano-g3.1

pyrrho-nano-g3.1 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.

It is not an answer generator and not an open-world fact checker. It sits between retrieval and generation, or beside a retrieval package as a fast evidence quality layer. Compared with pyrrho-nano-g3, this package adds multitask heads for pre-retrieval query-contract classification, semantic route/domain, taxonomy pattern, and six scalar governance signals.

Governance Labels

Label	Meaning
`ABSTAIN`	The retrieved sources do not contain enough evidence to answer the question.
`DISPUTED`	The retrieved sources conflict on the answer.
`TRUSTWORTHY`	The retrieved sources consistently support answering the question.

Multitask Heads

Head	Labels / values	Intended use
`governance`	`ABSTAIN`, `DISPUTED`, `TRUSTWORTHY`	Post-retrieval evidence sufficiency and conflict decision.
`query_contract`	`evidence_sufficiency`, `structured_lookup`, `temporal_grounding`, `exhaustive_coverage`, `comparison_coverage`, `representative_overview`	Pre-retrieval routing signal for what kind of evidence the query needs.
`route`	`science_medicine`, `law_policy`, `history_geography`, `technology_computing`, `economics_finance`, `culture_society`, `general_commonsense`	Semantic route/domain signal for retrieval policy and logging.
`taxonomy`	23 fitz-gov taxonomy patterns	Failure/support pattern signal for audit and diagnostics.
`scalars`	`evidence_sufficiency`, `query_evidence_alignment`, `answer_coverage`, `conflict_density`, `retrieval_retry_value`, `false_trustworthy_risk`	Continuous governance signals for retry, ranking, and monitoring.

Outputs

This is a custom multitask package, not a standard single-head AutoModelForSequenceClassification artifact. The recommended runtime is pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.

The predictor returns a structured object:

Field	Meaning
`governance.final_label`	Final calibrated label after the TRUSTWORTHY threshold rule.
`governance.raw_label`	Highest-probability governance label before threshold calibration.
`governance.probabilities`	Probability distribution over `ABSTAIN`, `DISPUTED`, `TRUSTWORTHY`.
`governance.threshold`	TRUSTWORTHY probability threshold used by the package.
`query_contract.final_label`	Query-only contract prediction.
`route.final_label`	Query-only semantic route/domain prediction.
`taxonomy.final_label`	Query+evidence taxonomy-pattern prediction.
`scalars`	Six bounded scalar governance signals.
`timing_ms`	Local inference timing for the call.

Example normalized output shape:

{
  "schema_version": "pyrrho_multitask_prediction_v1",
  "governance": {
    "raw_label": "TRUSTWORTHY",
    "final_label": "TRUSTWORTHY",
    "used_threshold_fallback": false,
    "threshold": 0.39,
    "confidence": 0.84,
    "probabilities": {
      "ABSTAIN": 0.08,
      "DISPUTED": 0.08,
      "TRUSTWORTHY": 0.84
    }
  },
  "query_contract": {
    "final_label": "structured_lookup"
  },
  "route": {
    "final_label": "economics_finance"
  },
  "taxonomy": {
    "final_label": "direct_answer"
  },
  "scalars": {
    "evidence_sufficiency": 0.91,
    "query_evidence_alignment": 0.88,
    "answer_coverage": 0.86,
    "conflict_density": 0.08,
    "retrieval_retry_value": 0.12,
    "false_trustworthy_risk": 0.09
  }
}

The model does not generate answers, citations, source spans, retrieval results, or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.

Intended Use

Use this model when a RAG or retrieval package needs fast local signals about:

whether retrieved evidence is enough to answer,
whether retrieved evidence conflicts,
what kind of evidence the query needs before retrieval,
which semantic/domain route the query belongs to,
which fitz-gov support/failure pattern is active,
whether retrieval should retry, broaden, or escalate.

This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.

Quick Start

Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:

from huggingface_hub import snapshot_download

from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor

MODEL_ID = "yafitzdev/pyrrho-nano-g3.1"
PACKAGE_DIR = snapshot_download(MODEL_ID)

query = "Which quarterly report is relevant?"
contexts = [
    "The Q2 report lists revenue, churn, and roadmap changes.",
]

predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)

print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["scalars"])

For local package testing:

python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g3.1 --device cpu

Release Selection

Seed: 7
TRUSTWORTHY threshold: 0.39
Selection reason: seed 7 had the strongest composite release score while retaining strong governance, query-contract, route, taxonomy, and scalar metrics.

Held-Out Test Metrics

Metric	Result
Governance accuracy	`0.9805`
False-TRUSTWORTHY rate	`0.0095`
Query-contract accuracy	`0.9492`
Query-contract macro F1	`0.9423`
Route accuracy	`0.9296`
Route macro F1	`0.9282`
Taxonomy accuracy	`0.8943`
Taxonomy macro F1	`0.8960`
Scalar MAE	`0.0587`

Three-seed headline from the local release summary:

Metric	Mean +/- std
Governance accuracy	`97.84 +/- 0.15%`
False-TRUSTWORTHY rate	`0.85 +/- 0.07%`
Query-contract macro F1	`94.24 +/- 0.28%`
Route accuracy	`93.41 +/- 0.32%`
Taxonomy accuracy	`89.26 +/- 0.23%`
Scalar MAE	`0.0592 +/- 0.0005`

Training Data

Trained on fitz-gov V8.1-style rows prepared from the V8.0.1 row set plus the mandatory routing.query_contract field. The release package records the local training config in training_config.yaml and detailed metrics in reports/summary.json.

Limitations

This is a governance and routing co-processor, not a generator.
The auxiliary heads are useful signals, not ground-truth explanations.
Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
The license is CC BY-NC 4.0. Commercial use requires a separate license.

Downloads last month: 34

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for yafitzdev/pyrrho-nano-g3.1

Base model

answerdotai/ModernBERT-base

Finetuned

(1302)

this model

yafitzdev
/

pyrrho-nano-g3.1