Model Description

This model is a fine-tuned version of microsoft/deberta-v3-base, optimized for Preference Classification (Reward Modeling). Instead of standard text classification, this model is designed to compare two AI-generated responses to the same prompt and predict which one is higher quality or more "preferred."

Dataset

Metrics

The model is evaluated using the following criteria, comparing the predicted probability distribution [P(A), P(B), P(Tie)] against the ground truth:

  • Multi-class Log Loss (Primary):

    • Definition: Measures the distance between the predicted probability distribution and the actual labels. L=βˆ’1Nβˆ‘i=1Nβˆ‘j=1Myi,jlog⁑(pi,j) L = -\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{M} y_{i,j} \log(p_{i,j})

    • Variables: Where M=3M = 3 (representing Response A, Response B, and Tie).

    • Why: It rewards the model for assigning higher probabilities to the correct outcome and heavily penalizes high-confidence incorrect predictions.

  • Accuracy (Secondary):

    • Definition: The percentage of instances where the class with the highest predicted probability matches the ground truth label.
    • Calculation: Correct Predictions / Total Samples.

Evaluation Results

The following results were achieved during final evaluation. Note that Accuracy was calculated using a local train/test split, while Log Loss follows the competition's evaluation framework.

Metric Value Source/Split
Multi-class Log Loss 1.0346 Kaggle Competition Metric
Accuracy 48.94% Local Train/Test Split

Note on Performance:

  • Log Loss: This score reflects the model's ability to provide well-calibrated probabilities for the three classes (A, B, and Tie) as required by the Kaggle competition.
  • Accuracy: This was monitored locally to ensure the model was successfully learning the preference patterns beyond a random baseline (33.33%).

Acknowledgments & Attribution

  • Base Model: This work utilizes DeBERTa-v3-base, developed by Microsoft.
  • Dataset: Training data was provided by the LMSYS LLM Classification Finetuning competition on Kaggle.
  • License Notice: This model is subject to the CC BY-NC 4.0 license due to the underlying dataset. It is intended for non-commercial, research, and educational purposes only.
Downloads last month
115
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Himanshu167/AI-Response-Comparer

Finetuned
(589)
this model

Evaluation results

  • Multi-class Log Loss on LLM Classification Finetuning (Kaggle)
    self-reported
    1.035
  • Accuracy on LLM Classification Finetuning (Kaggle)
    self-reported
    0.489