File size: 2,353 Bytes
4b3a33f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# BGE-M3 Efficiency Guide

This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project.

## 1. Key Metrics

### Performance (Infrastructure)
- **Latency**: Time taken to generate an embedding for a single text. Critical for real-time search.
- **Throughput**: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing).
- **VRAM/RAM Usage**: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32.

### Retrieval Quality (Accuracy)
- **Precision@K**: The proportion of relevant candidates in the top K results. 
    * *Example*: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%.
- **Recall@K** (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K.
    * *Example*: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%.
- **MRR (Mean Reciprocal Rank)**: Evaluates how high the first relevant candidate is ranked. 
    * *Formula*: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5.
- **NDCG (Normalized Discounted Cumulative Gain)**: Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top.

## 2. BGE-M3 Specific Features

BGE-M3 is a "multi-function" model. You can measure efficiency across three modes:
1. **Dense Retrieval**: Standard 1024d vectors. Fast and semantic.
2. **Sparse Retrieval (Lexical)**: Similar to BM25 but learned. More efficient for exact keyword matching.
3. **Multi-Vector (ColBERT style)**: Most accurate but highest storage and latency cost.

## 3. Optimization Techniques

### Precision Tuning
- **FP16**: Use `model.half()` if on GPU to double speed and halve memory with negligible accuracy loss.
- **Quantization**: Int8 or GGUF formats can reduce memory usage by 4x.

### Batching
Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing.

## 4. Measuring Quality in IRIS
To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate:
1. Fetch top 10 profiles for a job.
2. Check if the "ideal" candidate is in that list.
3. Average this over 50 test cases.