--- language: - vi - en library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - mathematics - vietnamese - exact-chunk-retrieval - hierarchical-learning - e5-base - mrr-optimization - fine-tuned - model-comparison base_model: intfloat/multilingual-e5-base metrics: - mean_reciprocal_rank - recall --- # E5-Math-Vietnamese: MRR-Optimized with Base Model Comparison ## Model Overview Fine-tuned E5-base model optimized with **MRR (Mean Reciprocal Rank)** for exact chunk retrieval in Vietnamese mathematics. Includes comprehensive comparison with base model. ## Performance Comparison ### Training vs Test Performance - **Best Validation MRR**: 0.8527777777777777 (avg rank: 1.1726384364820848) - **Test MRR**: 0.8439068100358421 (avg rank: 1.1849649607135275) - **Training Epochs**: 6 ### Fine-tuned vs Base Model Comparison | Metric | Fine-tuned | Base Model | Improvement | |--------|------------|------------|-------------| | **MRR** | 0.8439068100358421 | 0.7695340501792116 | +0.0743727598566305 (9.7%) | | **Avg Rank** | 1.1849649607135275 | 1.2994876571960874 | Better by 0.11452269648255986 positions | ### Detailed Recall@k Comparison | Metric | Fine-tuned | Base Model | Improvement | |--------|------------|------------|-------------| | Recall@1 | 0.720 | 0.602 | +0.118 | | Recall@2 | 0.925 | 0.860 | +0.065 | | Recall@3 | 0.968 | 0.925 | +0.043 | | Recall@4 | 0.978 | 0.968 | +0.011 | | Recall@5 | 1.000 | 0.989 | +0.011 | ## Key Improvements from Fine-tuning ✅ **MRR Boost**: +0.0743727598566305 improvement in Mean Reciprocal Rank ✅ **Ranking Quality**: Correct chunks moved up by avg 0.11452269648255986 positions ✅ **Hit Rate**: Better success rates across all Recall@k metrics ✅ **Vietnamese Math**: Specialized for Vietnamese mathematical content ✅ **Hierarchy**: Maintains Correct > Related > Irrelevant scoring ## Why MRR Matters for Exact Retrieval ``` MRR optimization pushes correct chunks to top positions: Before (Base Model): Rank 1: Related chunk (MRR contribution: 0.0) Rank 2: Irrelevant (MRR contribution: 0.0) Rank 3: CORRECT chunk (MRR contribution: 0.33) After (Fine-tuned): Rank 1: CORRECT chunk (MRR contribution: 1.0) ⭐ Rank 2: Related chunk (MRR contribution: 0.0) Rank 3: Irrelevant (MRR contribution: 0.0) Result: 3x better MRR, users find answers immediately! ``` ## Usage ```python from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity # Load MRR-optimized model model = SentenceTransformer('ThanhLe0125/e5-math') # ⚠️ CRITICAL: Must use E5 prefixes query = "query: Định nghĩa hàm số đồng biến là gì?" chunks = [ "passage: Hàm số đồng biến trên khoảng (a;b) là...", # CORRECT "passage: Ví dụ bài tập về hàm đồng biến...", # RELATED "passage: Phương trình bậc hai có dạng..." # IRRELEVANT ] # Get MRR-optimized rankings query_emb = model.encode([query]) chunk_embs = model.encode(chunks) similarities = cosine_similarity(query_emb, chunk_embs)[0] # With fine-tuning, correct chunk should be at rank #1 ranked_indices = similarities.argsort()[::-1] print(f"Rank 1: {chunks[ranked_indices[0]][:50]}... (Score: {similarities[ranked_indices[0]]:.3f})") # Expected: Correct chunk at rank #1 with high score ``` ## Inference Efficiency With MRR optimization, you typically only need **top 1-2 chunks**: ```python # Efficient inference - high probability correct chunk is #1 top_chunk = chunks[similarities.argmax()] confidence = similarities.max() if confidence > 0.7: # High confidence threshold return top_chunk # Likely the correct answer else: return chunks[similarities.argsort()[::-1][:3]] # Return top 3 as fallback ``` ## Evaluation Methodology - **Training**: train_question + val_question with MRR optimization - **Validation**: MRR for early stopping, Recall@3/5 monitoring - **Test**: test_question used once for final comparison - **Comparison**: Direct evaluation against base E5-multilingual model - **Metrics**: MRR, Recall@1,2,3,4,5, Hierarchy Rate ## Perfect For 🎯 **Educational Q&A**: Exact answers at rank #1 consistently ⚡ **Efficient Systems**: Fewer chunks needed at inference 🇻🇳 **Vietnamese Math**: Specialized mathematical terminology 📊 **Quality Ranking**: Hierarchical relevance scoring 🚀 **Production Ready**: Proven improvement over base model ## Technical Notes - **Base Model**: intfloat/multilingual-e5-base - **Fine-tuning**: Hierarchical contrastive learning with MRR optimization - **Max Sequence**: 256 tokens - **Training Data**: Vietnamese mathematical content with expert annotations - **Validation**: Proper train/validation/test split methodology *Fine-tuned on 25/06/2025 with comprehensive base model comparison.*