e5-base-math / README.md
ThanhLe0125's picture
Fine-tuned E5 model for Vietnamese math - 2025-06-24 08:49
5b8a6f0 verified
---
language:
- vi
- en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- embedding
- math
- vietnamese
- multilingual
- e5
base_model: intfloat/multilingual-e5-base
---
# E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model
## Model Description
This is a fine-tuned version of `intfloat/multilingual-e5-base` optimized for Vietnamese mathematics content. The model is specifically trained for embedding mathematical concepts, definitions, and problem-solving content in Vietnamese.
## Training Details
### Base Model
- **Base model**: `intfloat/multilingual-e5-base`
- **Fine-tuning objective**: Information Retrieval / Sentence Embedding
- **Training date**: 2025-06-24
### Training Configuration
- **Batch size**: 4
- **Learning rate**: 2e-05
- **Epochs**: 3
- **Max sequence length**: 256
- **Warmup steps**: 100
### Training Data
- **Domain**: Vietnamese Mathematics
- **Training examples**: 2055
- **Validation examples**: 229
## Usage
### Using SentenceTransformers
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('ThanhLe0125/e5-base-math')
# Encode queries (add prefix for better performance)
queries = ["query: Định nghĩa hàm số đồng biến là gì?"]
query_embeddings = model.encode(queries)
# Encode passages/documents
passages = ["passage: Hàm số đồng biến trên khoảng (a;b) là hàm số mà với mọi x1 < x2 thì f(x1) < f(x2)"]
passage_embeddings = model.encode(passages)
# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(query_embeddings, passage_embeddings)
```
### For RAG Applications
```python
# Recommended usage for RAG
def encode_query(query_text):
return model.encode([f"query: {query_text}"])
def encode_passage(passage_text):
return model.encode([f"passage: {passage_text}"])
# Example usage
query_emb = encode_query("Định nghĩa hàm số đồng biến")
passage_emb = encode_passage("Hàm số đồng biến là...")
# Calculate similarity
similarity = cosine_similarity(query_emb, passage_emb)[0][0]
print(f"Similarity: {similarity:.4f}")
```
## Applications
- **Information Retrieval**: Finding relevant mathematical content
- **RAG Systems**: Retrieval-Augmented Generation for math Q&A
- **Semantic Search**: Searching through mathematical documents
- **Content Recommendation**: Suggesting related mathematical concepts
## Performance
This model has been fine-tuned specifically for Vietnamese mathematical content and should perform better than the base model for math-related queries in Vietnamese.
## Languages
- Vietnamese (primary)
- English (inherited from base model)
## License
This model inherits the license from the base model `intfloat/multilingual-e5-base`.
## Citation
If you use this model, please cite:
```bibtex
@misc{e5-base-math,
author = {ThanhLe},
title = {E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ThanhLe0125/e5-base-math}}
}
```
## Contact
For questions or issues, please contact via the repository discussions.