Instructions to use vectara/hallucination_evaluation_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vectara/hallucination_evaluation_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="vectara/hallucination_evaluation_model", trust_remote_code=True)# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("vectara/hallucination_evaluation_model", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Score Generation Discrepancy in vectara/hallucination_evaluation_model
I'm encountering an unexpected behavior with the vectara/hallucination_evaluation_model. When I input identical/same sentences as both premise and hypothesis, the model outputs a score of approximately 0.93 instead of the expected 1.0.
I'm curious about the underlying scoring mechanism and potential reasons for this discrepancy. Any insights into the model's scoring function or potential biases would be greatly appreciated.
Here are some potential areas of exploration:
->How is the similarity between premise and hypothesis calculated?
->Are there any known limitations or biases in the model's scoring system?
->Could there be data-related issues affecting the score?
I'm looking forward to discussing this issue with the community and finding a solution.
Thanks for spotting this. It is generally difficult to explain why a Transformer-based model behaves so for particular inputs.
One possible reason is that a great portion of the training data is from the summarization task. Hence a hypothesis is often shorter and contains less information than the premise. In this sense, the score 0.93 does not purely reflect the extent of hallucinations but also summarization quality.