Spaces:
Sleeping
Sleeping
BGE-M3 Efficiency Guide
This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project.
1. Key Metrics
Performance (Infrastructure)
- Latency: Time taken to generate an embedding for a single text. Critical for real-time search.
- Throughput: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing).
- VRAM/RAM Usage: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32.
Retrieval Quality (Accuracy)
- Precision@K: The proportion of relevant candidates in the top K results.
- Example: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%.
- Recall@K (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K.
- Example: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%.
- MRR (Mean Reciprocal Rank): Evaluates how high the first relevant candidate is ranked.
- Formula: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5.
- NDCG (Normalized Discounted Cumulative Gain): Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top.
2. BGE-M3 Specific Features
BGE-M3 is a "multi-function" model. You can measure efficiency across three modes:
- Dense Retrieval: Standard 1024d vectors. Fast and semantic.
- Sparse Retrieval (Lexical): Similar to BM25 but learned. More efficient for exact keyword matching.
- Multi-Vector (ColBERT style): Most accurate but highest storage and latency cost.
3. Optimization Techniques
Precision Tuning
- FP16: Use
model.half()if on GPU to double speed and halve memory with negligible accuracy loss. - Quantization: Int8 or GGUF formats can reduce memory usage by 4x.
Batching
Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing.
4. Measuring Quality in IRIS
To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate:
- Fetch top 10 profiles for a job.
- Check if the "ideal" candidate is in that list.
- Average this over 50 test cases.