Spaces:

sameer2026
/

iris_backend

Sleeping

iris_backend / backend /docs /efficiency_guide.md

Implemented clustering

4b3a33f 2 months ago

2.35 kB

BGE-M3 Efficiency Guide

This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project.

Latency: Time taken to generate an embedding for a single text. Critical for real-time search.
Throughput: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing).
VRAM/RAM Usage: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32.

Precision@K: The proportion of relevant candidates in the top K results.
- Example: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%.
Recall@K (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K.
- Example: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%.
MRR (Mean Reciprocal Rank): Evaluates how high the first relevant candidate is ranked.
- Formula: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5.
NDCG (Normalized Discounted Cumulative Gain): Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top.

BGE-M3 is a "multi-function" model. You can measure efficiency across three modes:

Dense Retrieval: Standard 1024d vectors. Fast and semantic.
Sparse Retrieval (Lexical): Similar to BM25 but learned. More efficient for exact keyword matching.
Multi-Vector (ColBERT style): Most accurate but highest storage and latency cost.

FP16: Use model.half() if on GPU to double speed and halve memory with negligible accuracy loss.
Quantization: Int8 or GGUF formats can reduce memory usage by 4x.

Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing.

To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate: