Spaces:
Sleeping
Sleeping
| # BGE-M3 Efficiency Guide | |
| This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project. | |
| ## 1. Key Metrics | |
| ### Performance (Infrastructure) | |
| - **Latency**: Time taken to generate an embedding for a single text. Critical for real-time search. | |
| - **Throughput**: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing). | |
| - **VRAM/RAM Usage**: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32. | |
| ### Retrieval Quality (Accuracy) | |
| - **Precision@K**: The proportion of relevant candidates in the top K results. | |
| * *Example*: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%. | |
| - **Recall@K** (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K. | |
| * *Example*: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%. | |
| - **MRR (Mean Reciprocal Rank)**: Evaluates how high the first relevant candidate is ranked. | |
| * *Formula*: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5. | |
| - **NDCG (Normalized Discounted Cumulative Gain)**: Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top. | |
| ## 2. BGE-M3 Specific Features | |
| BGE-M3 is a "multi-function" model. You can measure efficiency across three modes: | |
| 1. **Dense Retrieval**: Standard 1024d vectors. Fast and semantic. | |
| 2. **Sparse Retrieval (Lexical)**: Similar to BM25 but learned. More efficient for exact keyword matching. | |
| 3. **Multi-Vector (ColBERT style)**: Most accurate but highest storage and latency cost. | |
| ## 3. Optimization Techniques | |
| ### Precision Tuning | |
| - **FP16**: Use `model.half()` if on GPU to double speed and halve memory with negligible accuracy loss. | |
| - **Quantization**: Int8 or GGUF formats can reduce memory usage by 4x. | |
| ### Batching | |
| Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing. | |
| ## 4. Measuring Quality in IRIS | |
| To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate: | |
| 1. Fetch top 10 profiles for a job. | |
| 2. Check if the "ideal" candidate is in that list. | |
| 3. Average this over 50 test cases. | |