iris_backend / backend /docs /efficiency_guide.md
Saandraahh's picture
Implemented clustering
4b3a33f

BGE-M3 Efficiency Guide

This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project.

1. Key Metrics

Performance (Infrastructure)

  • Latency: Time taken to generate an embedding for a single text. Critical for real-time search.
  • Throughput: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing).
  • VRAM/RAM Usage: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32.

Retrieval Quality (Accuracy)

  • Precision@K: The proportion of relevant candidates in the top K results.
    • Example: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%.
  • Recall@K (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K.
    • Example: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%.
  • MRR (Mean Reciprocal Rank): Evaluates how high the first relevant candidate is ranked.
    • Formula: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5.
  • NDCG (Normalized Discounted Cumulative Gain): Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top.

2. BGE-M3 Specific Features

BGE-M3 is a "multi-function" model. You can measure efficiency across three modes:

  1. Dense Retrieval: Standard 1024d vectors. Fast and semantic.
  2. Sparse Retrieval (Lexical): Similar to BM25 but learned. More efficient for exact keyword matching.
  3. Multi-Vector (ColBERT style): Most accurate but highest storage and latency cost.

3. Optimization Techniques

Precision Tuning

  • FP16: Use model.half() if on GPU to double speed and halve memory with negligible accuracy loss.
  • Quantization: Int8 or GGUF formats can reduce memory usage by 4x.

Batching

Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing.

4. Measuring Quality in IRIS

To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate:

  1. Fetch top 10 profiles for a job.
  2. Check if the "ideal" candidate is in that list.
  3. Average this over 50 test cases.