# BGE-M3 Efficiency Guide This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project. ## 1. Key Metrics ### Performance (Infrastructure) - **Latency**: Time taken to generate an embedding for a single text. Critical for real-time search. - **Throughput**: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing). - **VRAM/RAM Usage**: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32. ### Retrieval Quality (Accuracy) - **Precision@K**: The proportion of relevant candidates in the top K results. * *Example*: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%. - **Recall@K** (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K. * *Example*: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%. - **MRR (Mean Reciprocal Rank)**: Evaluates how high the first relevant candidate is ranked. * *Formula*: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5. - **NDCG (Normalized Discounted Cumulative Gain)**: Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top. ## 2. BGE-M3 Specific Features BGE-M3 is a "multi-function" model. You can measure efficiency across three modes: 1. **Dense Retrieval**: Standard 1024d vectors. Fast and semantic. 2. **Sparse Retrieval (Lexical)**: Similar to BM25 but learned. More efficient for exact keyword matching. 3. **Multi-Vector (ColBERT style)**: Most accurate but highest storage and latency cost. ## 3. Optimization Techniques ### Precision Tuning - **FP16**: Use `model.half()` if on GPU to double speed and halve memory with negligible accuracy loss. - **Quantization**: Int8 or GGUF formats can reduce memory usage by 4x. ### Batching Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing. ## 4. Measuring Quality in IRIS To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate: 1. Fetch top 10 profiles for a job. 2. Check if the "ideal" candidate is in that list. 3. Average this over 50 test cases.