You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access is provided for research and evaluation use only. Redistribution, commercial use, or publication of model weights is not permitted without written approval from Simple Machine Mind.

Log in or Sign Up to review the conditions and access this model content.

Evaluator v2 β€” Auditable AI Decision System (EvaluatorDPT)

Model ID: pcsankar73s/EvaluatorModel License: CC BY-NC 4.0 (non-commercial; approval required for inference) Access: πŸ”’ Gated β€” visible to all, usable only with explicit approval Author: Sankaranarayanan Palamadai Chandrasekaran Β· Simple Machine Mind


Overview

Most AI systems are built to always give an answer β€” even when they shouldn't. EvaluatorDPT is built differently: it reads structured signals, doesn't generate text, and produces a bounded decision of YES, NO, or defer to a human. Because it is signal-based and deterministic, it doesn't hallucinate. When it flags a case as uncertain, it is right to do so 93% of the time (TBD precision: 0.9306). The deferral threshold is tunable at deployment β€” teams can steer decisions toward their risk tolerance or business objective without retraining the underlying model.

EvaluatorDPT is a BERT-based multi-task model for auditable decision control under ambiguity. It produces a bounded three-class decision (YES / NO / TBD) alongside structured auxiliary outputs that remain available at inference time as explainability signals and control variables.

Unlike conventional classifiers that force a binary output regardless of evidence quality, EvaluatorDPT treats TBD (defer) as a trained first-class outcome β€” enabling uncertain cases to be routed to conservative handling without retraining the core model.

The model predicts:

  • Decision β€” YES / NO / TBD (defer)
  • Auxiliary Head 1 β€” Detects sentiment turbulence: emotional noise affecting decision clarity (28 labels)
  • Auxiliary Head 2 β€” Captures semantic value signals: ethical anchors such as fairness or caution (10 labels)

Auxiliary outputs are retained at inference time as structured control variables for downstream steering, thresholding, and reason-code generation.

Input/output contract: a context signal is mapped to a bounded decision, decision confidence, structured reason codes, and reason-code confidence scores.


Architecture

Backbone: bert-base-uncased (12-layer Transformer)

Heads:

  • decision β€” primary 3-class classifier (YES / NO / TBD) with confidence score
  • auxiliary_head_1 β€” multi-label signal layer for sentiment turbulence (28 labels)
  • auxiliary_head_2 β€” multi-label signal layer for value alignment (10 labels)

All inputs are tokenized to a maximum sequence length of 128 tokens.

Training recipe: Gradual unfreeze β†’ full unfreeze Β· LR = 1e-5 Β· Batch size = 32 Β· Early stopping (patience = 2) Β· Threshold sweep Β· Layer-wise differential learning rates Β· Cosine decay with warmup ratio 0.1 Β· Class weights on decision head for imbalance handling


Performance

Trained on 181,000 curated decision events. Evaluated on a stratified held-out test split of 22,748 examples (TBD majority class at 60.3%).

Method Accuracy Macro F1 Micro F1 Weighted F1
Majority baseline (always TBD) 0.6030 0.2508 0.6030 0.4537
EvaluatorDPT 0.8485 0.8215 0.8485 0.8506

Per-class breakdown:

Class Precision Recall F1 Support
YES 0.7683 0.9029 0.8302 5,871
NO 0.7164 0.7923 0.7524 3,159
TBD 0.9306 0.8381 0.8819 13,718

Inference latency (NVIDIA Tesla T4 GPU, 200 runs): p50 = 200 ms Β· p95 = 415 ms


Data Processing Modules

Included for Further Progress Cited (for Reference / Citation)
process_semeval2017_local process_sentiment140
process_financial_phrasebank process_imdb
process_tweeteval process_multinli
process_goemotions process_tweeteval_health
process_normbank_csv_concatenated
process_mft_from_json
process_meld
process_empathetic_dialogues
process_social_bias_frames
process_ethics_local
process_ethics_virtue

Use Cases

Decision gating under ambiguity β€” route inputs to YES, NO, or deferred handling based on evidence quality without forcing a binary commit.

Auditable AI workflows β€” every decision ships with a confidence score, value alignment signal, and sentiment turbulence signal that downstream systems can log, inspect, and act on.

Risk-sensitive deployments β€” use TBD precision (0.9306) and confidence scores to calibrate the YES execution threshold for deployment-specific risk tolerance without retraining.

Reason-code generation β€” auxiliary outputs provide structured context for human-readable explanations alongside each decision.


Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("pcsankar73s/EvaluatorModel")
model = AutoModelForSequenceClassification.from_pretrained("pcsankar73s/EvaluatorModel")

inputs = tokenizer(
    "Should we proceed given the current context?",
    return_tensors="pt",
    max_length=128,
    truncation=True,
)
outputs = model(**inputs)
# outputs.logits β†’ decision probabilities (YES / NO / TBD)
# confidence score derived from softmax of decision logits

Limitations

  • Results are specific to the training distribution; generalization to other domains requires separate validation.
  • Class imbalance in the NO class (13.9% of test split) limits NO performance; targeted sampling may improve this.
  • Inputs exceeding 128 tokens are truncated; longer documents require chunking or preprocessing.
  • Reported latency is hardware-dependent; re-characterize for your inference environment.
  • Auxiliary heads provide structured signals, not ground-truth classifiers for values or emotions.

Links


License

Model artifacts: CC BY-NC 4.0 β€” non-commercial use; contact for commercial licensing. Code and documentation: see repository LICENSE.


Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support