e5-math / README.md
ThanhLe0125's picture
MRR-optimized E5-Math (+0.074 MRR vs base) - 25/06/2025
48fea17 verified
---
language:
- vi
- en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- mathematics
- vietnamese
- exact-chunk-retrieval
- hierarchical-learning
- e5-base
- mrr-optimization
- fine-tuned
- model-comparison
base_model: intfloat/multilingual-e5-base
metrics:
- mean_reciprocal_rank
- recall
---
# E5-Math-Vietnamese: MRR-Optimized with Base Model Comparison
## Model Overview
Fine-tuned E5-base model optimized with **MRR (Mean Reciprocal Rank)** for exact chunk retrieval in Vietnamese mathematics. Includes comprehensive comparison with base model.
## Performance Comparison
### Training vs Test Performance
- **Best Validation MRR**: 0.8527777777777777 (avg rank: 1.1726384364820848)
- **Test MRR**: 0.8439068100358421 (avg rank: 1.1849649607135275)
- **Training Epochs**: 6
### Fine-tuned vs Base Model Comparison
| Metric | Fine-tuned | Base Model | Improvement |
|--------|------------|------------|-------------|
| **MRR** | 0.8439068100358421 | 0.7695340501792116 | +0.0743727598566305 (9.7%) |
| **Avg Rank** | 1.1849649607135275 | 1.2994876571960874 | Better by 0.11452269648255986 positions |
### Detailed Recall@k Comparison
| Metric | Fine-tuned | Base Model | Improvement |
|--------|------------|------------|-------------|
| Recall@1 | 0.720 | 0.602 | +0.118 |
| Recall@2 | 0.925 | 0.860 | +0.065 |
| Recall@3 | 0.968 | 0.925 | +0.043 |
| Recall@4 | 0.978 | 0.968 | +0.011 |
| Recall@5 | 1.000 | 0.989 | +0.011 |
## Key Improvements from Fine-tuning
**MRR Boost**: +0.0743727598566305 improvement in Mean Reciprocal Rank
**Ranking Quality**: Correct chunks moved up by avg 0.11452269648255986 positions
**Hit Rate**: Better success rates across all Recall@k metrics
**Vietnamese Math**: Specialized for Vietnamese mathematical content
**Hierarchy**: Maintains Correct > Related > Irrelevant scoring
## Why MRR Matters for Exact Retrieval
```
MRR optimization pushes correct chunks to top positions:
Before (Base Model):
Rank 1: Related chunk (MRR contribution: 0.0)
Rank 2: Irrelevant (MRR contribution: 0.0)
Rank 3: CORRECT chunk (MRR contribution: 0.33)
After (Fine-tuned):
Rank 1: CORRECT chunk (MRR contribution: 1.0) ⭐
Rank 2: Related chunk (MRR contribution: 0.0)
Rank 3: Irrelevant (MRR contribution: 0.0)
Result: 3x better MRR, users find answers immediately!
```
## Usage
```python
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
# Load MRR-optimized model
model = SentenceTransformer('ThanhLe0125/e5-math')
# ⚠️ CRITICAL: Must use E5 prefixes
query = "query: Định nghĩa hàm số đồng biến là gì?"
chunks = [
"passage: Hàm số đồng biến trên khoảng (a;b) là...", # CORRECT
"passage: Ví dụ bài tập về hàm đồng biến...", # RELATED
"passage: Phương trình bậc hai có dạng..." # IRRELEVANT
]
# Get MRR-optimized rankings
query_emb = model.encode([query])
chunk_embs = model.encode(chunks)
similarities = cosine_similarity(query_emb, chunk_embs)[0]
# With fine-tuning, correct chunk should be at rank #1
ranked_indices = similarities.argsort()[::-1]
print(f"Rank 1: {chunks[ranked_indices[0]][:50]}... (Score: {similarities[ranked_indices[0]]:.3f})")
# Expected: Correct chunk at rank #1 with high score
```
## Inference Efficiency
With MRR optimization, you typically only need **top 1-2 chunks**:
```python
# Efficient inference - high probability correct chunk is #1
top_chunk = chunks[similarities.argmax()]
confidence = similarities.max()
if confidence > 0.7: # High confidence threshold
return top_chunk # Likely the correct answer
else:
return chunks[similarities.argsort()[::-1][:3]] # Return top 3 as fallback
```
## Evaluation Methodology
- **Training**: train_question + val_question with MRR optimization
- **Validation**: MRR for early stopping, Recall@3/5 monitoring
- **Test**: test_question used once for final comparison
- **Comparison**: Direct evaluation against base E5-multilingual model
- **Metrics**: MRR, Recall@1,2,3,4,5, Hierarchy Rate
## Perfect For
🎯 **Educational Q&A**: Exact answers at rank #1 consistently
**Efficient Systems**: Fewer chunks needed at inference
🇻🇳 **Vietnamese Math**: Specialized mathematical terminology
📊 **Quality Ranking**: Hierarchical relevance scoring
🚀 **Production Ready**: Proven improvement over base model
## Technical Notes
- **Base Model**: intfloat/multilingual-e5-base
- **Fine-tuning**: Hierarchical contrastive learning with MRR optimization
- **Max Sequence**: 256 tokens
- **Training Data**: Vietnamese mathematical content with expert annotations
- **Validation**: Proper train/validation/test split methodology
*Fine-tuned on 25/06/2025 with comprehensive base model comparison.*