av-cctv-vlm-v21 / eval.md
seanphan's picture
Upload folder using huggingface_hub
c7805b7 verified

v1.0 Verification Results

Summary

Metric Value Threshold Result
retrieval_similarity 0.7239 >= 0.60 PASS
tIOU 0.6700 >= 0.40 PASS
mAP@0.5 0.5092 >= 0.50 PASS

Overall: PASS — v1.0 ACCEPTED

Details

  • Checkpoint: outputs/checkpoint-770 (step 770, eval_loss 0.5272)
  • Predictions: checkpoints/v21_semantic_boundaries/predictions.jsonl (98 samples)
  • Embedder: OpenAI text-embedding-3-small (production-aligned)
  • Date: 2026-03-19

Retrieval Similarity

  • Mean: 0.7239
  • Median: 0.7355
  • Min: 0.4982, Max: 0.9162
  • Excellent (>= 0.80): 22/98
  • Good (>= 0.60): 87/98
  • Acceptable (>= 0.40): 98/98

Temporal IoU (tIOU)

  • Mean: 0.6700
  • Median: 0.6703
  • Predictions with timestamps: 98/98
  • References with timestamps: 98/98

mAP@0.5

  • Mean: 0.5092

Production Embeddings (prior run)

For reference, the prior eval with OpenAI text-embedding-3-small on 20 pairs showed:

  • mean_similarity: 0.6781
  • production_ready: true

Failure Analysis

All metrics pass. v1.0 is accepted for production.

Per-Sample Scores (first 10)

# retrieval_sim tIOU mAP@0.5 pred_events ref_events
0 0.6549 0.5277 0.4500 5 4
1 0.7739 0.8080 1.0000 5 5
2 0.6433 1.0000 1.0000 4 4
3 0.7473 0.6228 0.4500 5 4
4 0.7774 0.5406 0.2500 4 4
5 0.7924 0.3868 0.0500 4 5
6 0.7626 0.7779 0.7500 4 3
7 0.6476 0.8060 0.7500 4 3
8 0.7355 0.7520 0.6400 5 5
9 0.8859 0.6579 0.5625 4 4