Spaces:

sameer2026
/

iris_backend

Sleeping

App Files Files Community

iris_backend / backend /docs /efficiency_guide.md

Saandraahh

Implemented clustering

4b3a33f 2 months ago

preview code

raw

history blame contribute delete

2.35 kB

	# BGE-M3 Efficiency Guide

	This guide explains how to measure and optimize the efficiency of the BAAI/bge-m3 model used in the IRIS project.

	## 1. Key Metrics

	### Performance (Infrastructure)
	- Latency: Time taken to generate an embedding for a single text. Critical for real-time search.
	- Throughput: Number of documents processed per second. Important for batch processing (e.g., initial profile indexing).
	- VRAM/RAM Usage: Memory footprint of the model. BGE-M3 is ~2.2GB in FP32.

	### Retrieval Quality (Accuracy)
	- Precision@K: The proportion of relevant candidates in the top K results.
	* Example: If you return 10 candidates and 3 are actually qualified, Precision@10 = 30%.
	- Recall@K (Correlation to User's "callback"): The proportion of total relevant candidates that were successfully captured in the top K.
	* Example: If there are 5 qualified candidates in the database and your search finds 4 of them in the top 10, Recall@10 = 80%.
	- MRR (Mean Reciprocal Rank): Evaluates how high the first relevant candidate is ranked.
	* Formula: $1 / Rank$. If the best candidate is at position #1, score is 1.0. If at #2, score is 0.5.
	- NDCG (Normalized Discounted Cumulative Gain): Measures the overall quality of the ranking order, giving more weight to highly relevant results at the very top.

	## 2. BGE-M3 Specific Features

	BGE-M3 is a "multi-function" model. You can measure efficiency across three modes:
	1. Dense Retrieval: Standard 1024d vectors. Fast and semantic.
	2. Sparse Retrieval (Lexical): Similar to BM25 but learned. More efficient for exact keyword matching.
	3. Multi-Vector (ColBERT style): Most accurate but highest storage and latency cost.

	## 3. Optimization Techniques

	### Precision Tuning
	- FP16: Use `model.half()` if on GPU to double speed and halve memory with negligible accuracy loss.
	- Quantization: Int8 or GGUF formats can reduce memory usage by 4x.

	### Batching
	Using optimal batch sizes (e.g., 16-32) significantly improves throughput compared to single-sentence processing.

	## 4. Measuring Quality in IRIS
	To measure quality, create a "Golden Dataset" of (Job Description, Relevant Profiles) and calculate Hit Rate:
	1. Fetch top 10 profiles for a job.
	2. Check if the "ideal" candidate is in that list.
	3. Average this over 50 test cases.