Instructions to use yafitzdev/pyrrho-nano-g3.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yafitzdev/pyrrho-nano-g3.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="yafitzdev/pyrrho-nano-g3.2")# Load model directly from transformers import AutoTokenizer, PyrrhoMultiTaskModernBert tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-nano-g3.2") model = PyrrhoMultiTaskModernBert.from_pretrained("yafitzdev/pyrrho-nano-g3.2") - Notebooks
- Google Colab
- Kaggle
pyrrho-nano-g3.2
pyrrho-nano-g3.2 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.
It is not an answer generator and not an open-world fact checker. It sits between
retrieval and generation, or beside a retrieval package as a fast evidence
quality layer. Compared with pyrrho-nano-g3.1, this package adds V8.2 retrieval-control heads for retrieval action, gap type, answerability shape, retrieval modality, and an additional evidence_failure_severity scalar.
Governance Labels
| Label | Meaning |
|---|---|
ABSTAIN |
The retrieved sources do not contain enough evidence to answer the question. |
DISPUTED |
The retrieved sources conflict on the answer. |
TRUSTWORTHY |
The retrieved sources consistently support answering the question. |
Multitask Heads
| Head | Labels / values | Intended use |
|---|---|---|
governance |
ABSTAIN, DISPUTED, TRUSTWORTHY |
Post-retrieval evidence sufficiency and conflict decision. |
query_contract |
evidence_sufficiency, structured_lookup, temporal_grounding, exhaustive_coverage, comparison_coverage, representative_overview |
Pre-retrieval routing signal for what kind of evidence the query needs. |
route |
science_medicine, law_policy, history_geography, technology_computing, economics_finance, culture_society, general_commonsense |
Semantic route/domain signal for retrieval policy and logging. |
taxonomy |
23 fitz-gov taxonomy patterns | Failure/support pattern signal for audit and diagnostics. |
scalars |
evidence_sufficiency, query_evidence_alignment, answer_coverage, conflict_density, retrieval_retry_value, false_trustworthy_risk, evidence_failure_severity |
Continuous governance signals for retry, ranking, and monitoring. |
retrieval_action |
answer_now, retrieve_more, broaden_search, resolve_conflict, ask_clarifying_question, structured_lookup |
Retrieval policy hint for the next pipeline action. |
gap_type |
12 evidence-gap labels | More specific reason why retrieval is insufficient or conflicting. |
answerability_shape |
11 answer-shape labels | Query-only hint for the answer shape the evidence must support. |
retrieval_modality |
unstructured_text, structured_table, code, configuration, log_trace, pdf_layout, mixed |
Query-only hint for the preferred retrieval substrate. |
Outputs
This is a custom multitask package, not a standard single-head
AutoModelForSequenceClassification artifact. The recommended runtime is
pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.
The predictor returns a structured object:
| Field | Meaning |
|---|---|
governance.final_label |
Final calibrated label after the TRUSTWORTHY threshold rule. |
governance.raw_label |
Highest-probability governance label before threshold calibration. |
governance.probabilities |
Probability distribution over ABSTAIN, DISPUTED, TRUSTWORTHY. |
governance.threshold |
TRUSTWORTHY probability threshold used by the package. |
query_contract.final_label |
Query-only contract prediction. |
route.final_label |
Query-only semantic route/domain prediction. |
taxonomy.final_label |
Query+evidence taxonomy-pattern prediction. |
scalars |
7 bounded scalar governance signals. |
retrieval_action.final_label |
Retrieval policy hint. |
gap_type.final_label |
Evidence-gap type prediction. |
answerability_shape.final_label |
Query-only answer-shape prediction. |
retrieval_modality.final_label |
Query-only retrieval-modality prediction. |
timing_ms |
Local inference timing for the call. |
Example normalized output shape:
{
"schema_version": "pyrrho_multitask_prediction_v1",
"governance": {
"raw_label": "TRUSTWORTHY",
"final_label": "TRUSTWORTHY",
"used_threshold_fallback": false,
"threshold": 0.34,
"confidence": 0.84,
"probabilities": {
"ABSTAIN": 0.08,
"DISPUTED": 0.08,
"TRUSTWORTHY": 0.84
}
},
"query_contract": {
"final_label": "structured_lookup"
},
"route": {
"final_label": "economics_finance"
},
"taxonomy": {
"final_label": "direct_answer"
},
"retrieval_action": {
"final_label": "answer_now"
},
"scalars": {
"evidence_sufficiency": 0.91,
"query_evidence_alignment": 0.88,
"answer_coverage": 0.86,
"conflict_density": 0.08,
"retrieval_retry_value": 0.12,
"false_trustworthy_risk": 0.09,
"evidence_failure_severity": 0.07
}
}
The model does not generate answers, citations, source spans, retrieval results,
or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.
Intended Use
Use this model when a RAG or retrieval package needs fast local signals about:
- whether retrieved evidence is enough to answer,
- whether retrieved evidence conflicts,
- what kind of evidence the query needs before retrieval,
- which semantic/domain route the query belongs to,
- which fitz-gov support/failure pattern is active,
- what retrieval action and gap type the evidence state suggests,
- whether retrieval should retry, broaden, or escalate.
This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.
Quick Start
Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:
from huggingface_hub import snapshot_download
from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor
MODEL_ID = "yafitzdev/pyrrho-nano-g3.2"
PACKAGE_DIR = snapshot_download(MODEL_ID)
query = "Which quarterly report is relevant?"
contexts = [
"The Q2 report lists revenue, churn, and roadmap changes.",
]
predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)
print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["scalars"])
For local package testing:
python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g3.2 --device cpu
Release Selection
- Seed:
1337 - TRUSTWORTHY threshold:
0.34 - Selection reason: Seed 1337 was selected because it has the best held-out governance result among the completed g3.2 seeds: 97.56% accuracy / 0.89% false-trustworthy at tau 0.34, while also having the strongest retrieval-modality held-out macro F1.
Held-Out Test Metrics
| Metric | Result |
|---|---|
| Governance accuracy | 0.9756 |
| False-TRUSTWORTHY rate | 0.0089 |
| Query-contract accuracy | 0.9492 |
| Query-contract macro F1 | 0.9387 |
| Route accuracy | 0.9248 |
| Route macro F1 | 0.9221 |
| Taxonomy accuracy | 0.8874 |
| Taxonomy macro F1 | 0.8879 |
| Scalar MAE | 0.0634 |
| Retrieval-action macro F1 | 0.8680 |
| Gap-type macro F1 | 0.7537 |
| Answerability-shape macro F1 | 0.7626 |
| Retrieval-modality macro F1 | 0.5810 |
Three-seed headline from the local release summary:
| Metric | Mean +/- std |
|---|---|
| Governance accuracy | 97.29 +/- 0.22% |
| False-TRUSTWORTHY rate | 1.11 +/- 0.17% |
| Query-contract macro F1 | 93.92 +/- 0.22% |
| Route accuracy | 92.59 +/- 0.54% |
| Taxonomy accuracy | 88.80 +/- 0.19% |
| Scalar MAE | 0.0640 +/- 0.0005 |
| Retrieval-action macro F1 | 86.69 +/- 0.23% |
| Gap-type macro F1 | 76.75 +/- 1.56% |
| Answerability-shape macro F1 | 75.53 +/- 0.54% |
| Retrieval-modality macro F1 | 55.75 +/- 1.75% |
Training Data
Trained on fitz-gov V8.2.0 rows with mandatory routing.query_contract and routing.retrieval_control fields. The release package records the local training config in
training_config.yaml and detailed metrics in reports/summary.json.
Limitations
- This is a governance and routing co-processor, not a generator.
- The auxiliary heads are useful signals, not ground-truth explanations.
- Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
- Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
- Retrieval modality is the weakest new head; sparse subclasses such as
pdf_layoutshould be treated as hints, not authoritative decisions. - The V8.2 retrieval-control labels are mechanically validated, but did not receive a separate independent blind-label QA pass before this local candidate package.
- The license is CC BY-NC 4.0. Commercial use requires a separate license.
- Downloads last month
- 17
Model tree for yafitzdev/pyrrho-nano-g3.2
Base model
answerdotai/ModernBERT-base