You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Access is provided for research and evaluation use only. Redistribution, commercial use, or publication of model weights is not permitted without written approval from Simple Machine Mind.
Log in or Sign Up to review the conditions and access this model content.
Evaluator v2 β Auditable AI Decision System (EvaluatorDPT)
Model ID: pcsankar73s/EvaluatorModel
License: CC BY-NC 4.0 (non-commercial; approval required for inference)
Access: π Gated β visible to all, usable only with explicit approval
Author: Sankaranarayanan Palamadai Chandrasekaran Β· Simple Machine Mind
Overview
Most AI systems are built to always give an answer β even when they shouldn't. EvaluatorDPT is built differently: it reads structured signals, doesn't generate text, and produces a bounded decision of YES, NO, or defer to a human. Because it is signal-based and deterministic, it doesn't hallucinate. When it flags a case as uncertain, it is right to do so 93% of the time (TBD precision: 0.9306). The deferral threshold is tunable at deployment β teams can steer decisions toward their risk tolerance or business objective without retraining the underlying model.
EvaluatorDPT is a BERT-based multi-task model for auditable decision control under ambiguity. It produces a bounded three-class decision (YES / NO / TBD) alongside structured auxiliary outputs that remain available at inference time as explainability signals and control variables.
Unlike conventional classifiers that force a binary output regardless of evidence quality, EvaluatorDPT treats TBD (defer) as a trained first-class outcome β enabling uncertain cases to be routed to conservative handling without retraining the core model.
The model predicts:
- Decision β YES / NO / TBD (defer)
- Auxiliary Head 1 β Detects sentiment turbulence: emotional noise affecting decision clarity (28 labels)
- Auxiliary Head 2 β Captures semantic value signals: ethical anchors such as fairness or caution (10 labels)
Auxiliary outputs are retained at inference time as structured control variables for downstream steering, thresholding, and reason-code generation.
Input/output contract: a context signal is mapped to a bounded decision, decision confidence, structured reason codes, and reason-code confidence scores.
Architecture
Backbone: bert-base-uncased (12-layer Transformer)
Heads:
decisionβ primary 3-class classifier (YES / NO / TBD) with confidence scoreauxiliary_head_1β multi-label signal layer for sentiment turbulence (28 labels)auxiliary_head_2β multi-label signal layer for value alignment (10 labels)
All inputs are tokenized to a maximum sequence length of 128 tokens.
Training recipe: Gradual unfreeze β full unfreeze Β· LR = 1e-5 Β· Batch size = 32 Β· Early stopping (patience = 2) Β· Threshold sweep Β· Layer-wise differential learning rates Β· Cosine decay with warmup ratio 0.1 Β· Class weights on decision head for imbalance handling
Performance
Trained on 181,000 curated decision events. Evaluated on a stratified held-out test split of 22,748 examples (TBD majority class at 60.3%).
| Method | Accuracy | Macro F1 | Micro F1 | Weighted F1 |
|---|---|---|---|---|
| Majority baseline (always TBD) | 0.6030 | 0.2508 | 0.6030 | 0.4537 |
| EvaluatorDPT | 0.8485 | 0.8215 | 0.8485 | 0.8506 |
Per-class breakdown:
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| YES | 0.7683 | 0.9029 | 0.8302 | 5,871 |
| NO | 0.7164 | 0.7923 | 0.7524 | 3,159 |
| TBD | 0.9306 | 0.8381 | 0.8819 | 13,718 |
Inference latency (NVIDIA Tesla T4 GPU, 200 runs): p50 = 200 ms Β· p95 = 415 ms
Data Processing Modules
| Included for Further Progress | Cited (for Reference / Citation) |
|---|---|
| process_semeval2017_local | process_sentiment140 |
| process_financial_phrasebank | process_imdb |
| process_tweeteval | process_multinli |
| process_goemotions | process_tweeteval_health |
| process_normbank_csv_concatenated | |
| process_mft_from_json | |
| process_meld | |
| process_empathetic_dialogues | |
| process_social_bias_frames | |
| process_ethics_local | |
| process_ethics_virtue |
Use Cases
Decision gating under ambiguity β route inputs to YES, NO, or deferred handling based on evidence quality without forcing a binary commit.
Auditable AI workflows β every decision ships with a confidence score, value alignment signal, and sentiment turbulence signal that downstream systems can log, inspect, and act on.
Risk-sensitive deployments β use TBD precision (0.9306) and confidence scores to calibrate the YES execution threshold for deployment-specific risk tolerance without retraining.
Reason-code generation β auxiliary outputs provide structured context for human-readable explanations alongside each decision.
Example Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("pcsankar73s/EvaluatorModel")
model = AutoModelForSequenceClassification.from_pretrained("pcsankar73s/EvaluatorModel")
inputs = tokenizer(
"Should we proceed given the current context?",
return_tensors="pt",
max_length=128,
truncation=True,
)
outputs = model(**inputs)
# outputs.logits β decision probabilities (YES / NO / TBD)
# confidence score derived from softmax of decision logits
Limitations
- Results are specific to the training distribution; generalization to other domains requires separate validation.
- Class imbalance in the NO class (13.9% of test split) limits NO performance; targeted sampling may improve this.
- Inputs exceeding 128 tokens are truncated; longer documents require chunking or preprocessing.
- Reported latency is hardware-dependent; re-characterize for your inference environment.
- Auxiliary heads provide structured signals, not ground-truth classifiers for values or emotions.
Links
- GitHub: pcsankar73/EvaluatorDPT-Publish
- OSF preprint: https://osf.io/ztnya/
- Paper (arXiv): TBD
- Contact: sankar@smsquared.ai
License
Model artifacts: CC BY-NC 4.0 β non-commercial use; contact for commercial licensing. Code and documentation: see repository LICENSE.
- Downloads last month
- 18