README.md · ThanhLe0125/e5-math at main

e5-math / README.md

ThanhLe0125

MRR-optimized E5-Math (+0.074 MRR vs base) - 25/06/2025

48fea17 verified 8 months ago

preview code

raw

history blame contribute delete

4.88 kB

	---
	language:
	- vi
	- en
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	tags:
	- sentence-transformers
	- mathematics
	- vietnamese
	- exact-chunk-retrieval
	- hierarchical-learning
	- e5-base
	- mrr-optimization
	- fine-tuned
	- model-comparison
	base_model: intfloat/multilingual-e5-base
	metrics:
	- mean_reciprocal_rank
	- recall
	---

	# E5-Math-Vietnamese: MRR-Optimized with Base Model Comparison

	## Model Overview
	Fine-tuned E5-base model optimized with MRR (Mean Reciprocal Rank) for exact chunk retrieval in Vietnamese mathematics. Includes comprehensive comparison with base model.

	## Performance Comparison

	### Training vs Test Performance
	- Best Validation MRR: 0.8527777777777777 (avg rank: 1.1726384364820848)
	- Test MRR: 0.8439068100358421 (avg rank: 1.1849649607135275)
	- Training Epochs: 6

	### Fine-tuned vs Base Model Comparison

	\| Metric \| Fine-tuned \| Base Model \| Improvement \|
	\|--------\|------------\|------------\|-------------\|
	\| MRR \| 0.8439068100358421 \| 0.7695340501792116 \| +0.0743727598566305 (9.7%) \|
	\| Avg Rank \| 1.1849649607135275 \| 1.2994876571960874 \| Better by 0.11452269648255986 positions \|

	### Detailed Recall@k Comparison

	\| Metric \| Fine-tuned \| Base Model \| Improvement \|
	\|--------\|------------\|------------\|-------------\|
	\| Recall@1 \| 0.720 \| 0.602 \| +0.118 \|
	\| Recall@2 \| 0.925 \| 0.860 \| +0.065 \|
	\| Recall@3 \| 0.968 \| 0.925 \| +0.043 \|
	\| Recall@4 \| 0.978 \| 0.968 \| +0.011 \|
	\| Recall@5 \| 1.000 \| 0.989 \| +0.011 \|


	## Key Improvements from Fine-tuning

	✅ MRR Boost: +0.0743727598566305 improvement in Mean Reciprocal Rank
	✅ Ranking Quality: Correct chunks moved up by avg 0.11452269648255986 positions
	✅ Hit Rate: Better success rates across all Recall@k metrics
	✅ Vietnamese Math: Specialized for Vietnamese mathematical content
	✅ Hierarchy: Maintains Correct > Related > Irrelevant scoring

	## Why MRR Matters for Exact Retrieval

	```
	MRR optimization pushes correct chunks to top positions:

	Before (Base Model):
	Rank 1: Related chunk (MRR contribution: 0.0)
	Rank 2: Irrelevant (MRR contribution: 0.0)
	Rank 3: CORRECT chunk (MRR contribution: 0.33)

	After (Fine-tuned):
	Rank 1: CORRECT chunk (MRR contribution: 1.0) ⭐
	Rank 2: Related chunk (MRR contribution: 0.0)
	Rank 3: Irrelevant (MRR contribution: 0.0)

	Result: 3x better MRR, users find answers immediately!
	```

	## Usage

	```python
	from sentence_transformers import SentenceTransformer
	from sklearn.metrics.pairwise import cosine_similarity

	# Load MRR-optimized model
	model = SentenceTransformer('ThanhLe0125/e5-math')

	# ⚠️ CRITICAL: Must use E5 prefixes
	query = "query: Định nghĩa hàm số đồng biến là gì?"
	chunks = [
	"passage: Hàm số đồng biến trên khoảng (a;b) là...", # CORRECT
	"passage: Ví dụ bài tập về hàm đồng biến...", # RELATED
	"passage: Phương trình bậc hai có dạng..." # IRRELEVANT
	]

	# Get MRR-optimized rankings
	query_emb = model.encode([query])
	chunk_embs = model.encode(chunks)
	similarities = cosine_similarity(query_emb, chunk_embs)[0]

	# With fine-tuning, correct chunk should be at rank #1
	ranked_indices = similarities.argsort()[::-1]
	print(f"Rank 1: {chunks[ranked_indices[0]][:50]}... (Score: {similarities[ranked_indices[0]]:.3f})")

	# Expected: Correct chunk at rank #1 with high score
	```

	## Inference Efficiency

	With MRR optimization, you typically only need top 1-2 chunks:

	```python
	# Efficient inference - high probability correct chunk is #1
	top_chunk = chunks[similarities.argmax()]
	confidence = similarities.max()

	if confidence > 0.7: # High confidence threshold
	return top_chunk # Likely the correct answer
	else:
	return chunks[similarities.argsort()[::-1][:3]] # Return top 3 as fallback
	```

	## Evaluation Methodology

	- Training: train_question + val_question with MRR optimization
	- Validation: MRR for early stopping, Recall@3/5 monitoring
	- Test: test_question used once for final comparison
	- Comparison: Direct evaluation against base E5-multilingual model
	- Metrics: MRR, Recall@1,2,3,4,5, Hierarchy Rate

	## Perfect For

	🎯 Educational Q&A: Exact answers at rank #1 consistently
	⚡ Efficient Systems: Fewer chunks needed at inference
	🇻🇳 Vietnamese Math: Specialized mathematical terminology
	📊 Quality Ranking: Hierarchical relevance scoring
	🚀 Production Ready: Proven improvement over base model

	## Technical Notes

	- Base Model: intfloat/multilingual-e5-base
	- Fine-tuning: Hierarchical contrastive learning with MRR optimization
	- Max Sequence: 256 tokens
	- Training Data: Vietnamese mathematical content with expert annotations
	- Validation: Proper train/validation/test split methodology

	Fine-tuned on 25/06/2025 with comprehensive base model comparison.