OmniScore DeBERTa-v3

QCRI/OmniScore-deberta-v3 is a multi-output regression model for automatic text quality evaluation. It predicts four scalar scores in the range [1, 5]:

  • informativeness
  • clarity
  • plausibility
  • faithfulness

The model is built on top of microsoft/deberta-v3-base and published with custom model code (AutoModel + trust_remote_code=True).

Model Details

  • Base model: microsoft/deberta-v3-base
  • Architecture: ScorePredictorModel (custom transformers model)
  • Model type: encoder-only text regression
  • Max sequence length: 512
  • Number of outputs: 4
  • Output range: [1, 5] (sigmoid-scaled in model head)
  • Backbone hidden size: 768
  • Saved dtype: float32

Quick Access

Model page: https://huggingface.co/QCRI/OmniScore-deberta-v3

from transformers import AutoTokenizer, AutoModel

repo_id = "QCRI/OmniScore-deberta-v3"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

What Input To Provide

The model takes a single text string and returns four quality scores. For best results, keep a consistent prompt/input format during inference.

Recommended flat format:

Task: <task_name>
Source: <source text, if available>
Reference: <reference text, if available>
Candidate: <model output being evaluated>

Chat-style input can be flattened as:

System: ...
User: ...
Assistant: ...

Usage Examples

Install dependencies:

pip install -U torch transformers sentencepiece

1) Single Text Example

import torch
from transformers import AutoTokenizer, AutoModel

repo_id = "QCRI/OmniScore-deberta-v3"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()

text = """Task: headline_evaluation
Source: Full article text goes here.
Candidate: Microsoft releases detailed model documentation."""

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)

scores = {
    name: float(outputs.predictions[0, i])
    for i, name in enumerate(model.config.score_names)
}
print(scores)

2) Batch Example (GPU/CPU)

import torch
from transformers import AutoTokenizer, AutoModel

repo_id = "QCRI/OmniScore-deberta-v3"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).to(device).eval()

texts = [
    "Task: summarization\nSource: ...\nCandidate: ...",
    "Task: translation_evaluation\nSource: ...\nReference: ...\nCandidate: ...",
]

batch = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
batch = {k: v.to(device) for k, v in batch.items()}

with torch.no_grad():
    pred = model(**batch).predictions

results = []
for row in pred.cpu():
    results.append({name: float(row[i]) for i, name in enumerate(model.config.score_names)})

print(results)

3) Chat Messages Helper

from transformers import AutoTokenizer, AutoModel
import torch

repo_id = "QCRI/OmniScore-deberta-v3"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a concise summary of this article."},
    {"role": "assistant", "content": "Here is a short summary..."},
]

flat_text = " ".join([f"{m['role'].capitalize()}: {m['content']}" for m in messages])
inputs = tokenizer(flat_text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)

print(dict((n, float(outputs.predictions[0, i])) for i, n in enumerate(model.config.score_names)))

Programmatic Download (Optional)

from huggingface_hub import snapshot_download

local_dir = snapshot_download("QCRI/OmniScore-deberta-v3")
print(local_dir)

Data and Task Coverage

This checkpoint is for multi-task text quality scoring and is evaluated on the test set covering:

  • Chat evaluation
  • Headline evaluation
  • Paraphrase evaluation
  • QA evaluation
  • Summarization evaluation
  • Translation evaluation

The underlying project data is multilingual and multi-domain.

Intended Use

Use this model to score generated text quality (or response quality) as a supporting signal in:

  • evaluation dashboards
  • ranking experiments
  • offline model comparison
  • human-in-the-loop workflows

Not intended as a sole decision maker for high-stakes or safety-critical settings.

Limitations

  • Scores are continuous estimates and should not be treated as absolute truth.
  • Performance differs by task, language, and domain.
  • The model can inherit annotation noise and dataset biases.
  • Long inputs beyond 512 tokens are truncated.
  • Low correlation metrics on some dimensions indicate that rank ordering can be weak for certain subsets.

Responsible Use

Recommended:

  • Use as a decision-support signal, not as a sole decision maker.
  • Calibrate thresholds on your own validation set before production use.
  • Monitor by language/task slices for fairness and reliability.

Not recommended:

  • High-stakes automated decisions without human oversight.
  • Out-of-domain deployment without re-validation.

Reproducibility Notes

Published artifacts include:

  • model.safetensors
  • config.json
  • configuration_score_predictor.py
  • modeling_score_predictor.py
  • tokenizer files
  • metrics_final.json
  • predictions.jsonl

Load with trust_remote_code=True because the architecture is custom.

Citation

If you use this model, please cite the project/repository and this model URL:

@misc{qcri_omniscore_deberta_v3,
  title        = {OmniScore DeBERTa-v3},
  author       = {QCRI},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/QCRI/OmniScore-deberta-v3}}
}
Downloads last month
213
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including QCRI/OmniScore-deberta-v3