vqa-backend / scores /feature.txt
Deva8's picture
Deploy VQA Space with model downloader
bb8f662
================================================================================
EVALUATION RESULTS
================================================================================
πŸ“Š Accuracy Metrics:
Exact Match Accuracy: 50.17% (63805/135256)
VQA Accuracy: 15.72%
πŸ“Š ANLS Metrics:
Average ANLS (Ο„=0.5): 50.18%
ANLS Std Dev: 48.96%
πŸ“Š Additional Statistics:
Total samples: 135256
Avg prediction length: 1.13 words
Avg GT length: 1.10 words
================================================================================
SAMPLE PREDICTIONS
================================================================================
πŸ† Best Predictions (Highest ANLS):
--------------------------------------------------------------------------------
Ground Truth: tusks
Prediction: tusks
ANLS: 1.0000
Exact Match: βœ“
Ground Truth: seagull
Prediction: seagull
ANLS: 1.0000
Exact Match: βœ“
Ground Truth: bedroom
Prediction: bedroom
ANLS: 1.0000
Exact Match: βœ“
Ground Truth: cake
Prediction: cake
ANLS: 1.0000
Exact Match: βœ“
Ground Truth: short
Prediction: short
ANLS: 1.0000
Exact Match: βœ“
================================================================================
⚠️ Worst Predictions (Lowest ANLS):
--------------------------------------------------------------------------------
Ground Truth: mirror
Prediction: car
ANLS: 0.0000
Exact Match: βœ—
Ground Truth: towel
Prediction: toy
ANLS: 0.0000
Exact Match: βœ—
Ground Truth: book
Prediction: camera
ANLS: 0.0000
Exact Match: βœ—
Ground Truth: usa
Prediction: england
ANLS: 0.0000
Exact Match: βœ—
Ground Truth: red and yellow
Prediction: green
ANLS: 0.0000
Exact Match: βœ—