ZombitX64
/

MultiSent-E5-Pro

@@ -1,4 +1,5 @@
 ---
 license: cc-by-nc-nd-4.0
 language:
 - th
@@ -94,525 +95,197 @@ language:
 - xh
 - yi
 - zh
-base_model:
-- intfloat/multilingual-e5-large
 library_name: transformers
 pipeline_tag: text-classification
-metrics:
-- accuracy
-- f1
-- bertscore
 tags:
-- sentiment-analysis
-- thai
-- classification
-- fine-tuned
-- multilingual
-new_version: ZombitX64/MultiSent-E5-Pro
 datasets:
-- ZombitX64/SEACrowdWongnaiReviews
-- ZombitX64/Sentiment-Benchmark
----
-# ZombitX64-MultiSent-E5-Pro
-<div align="center">
-  <picture>
-      <img src="https://cdn-uploads.huggingface.co/production/uploads/673eef9c4edfc6d3b58ba3aa/Gl94xasTswsG1cOjR_076.png" width="40%" alt="MultiSent-E5">
-  </picture>
-</div>
-A Thai sentiment analysis model fine-tuned from multilingual-e5-large for classifying sentiment in Thai text into positive, negative, neutral, and question categories.
-## Model Details
-### Model Description
-This model is a fine-tuned version of intfloat/multilingual-e5-large specifically trained for Thai sentiment analysis. It can classify Thai text into four sentiment categories: positive, negative, neutral, and question. The model demonstrates strong performance on Thai language sentiment classification tasks with high accuracy and good understanding of Thai linguistic nuances including sarcasm and implicit sentiment.
-The model is particularly effective at:
-- **Sarcasm Detection**: Understanding when positive words are used in a negative context
-- **Cultural Context**: Recognizing Thai-specific expressions and cultural references
-- **Implicit Sentiment**: Detecting sentiment even when not explicitly stated
-- **Colloquial Language**: Processing informal Thai text from social media and conversations
-* **Developed by:** ZombitX64, Krittanut Janutsaha, Chanyut Saengwichain
-* **Model type:** Sequence Classification (Sentiment Analysis)
-* **Language(s) (NLP):** Thai (th) - Primary, with limited multilingual capability
-* **License:** Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
-* **Finetuned from model:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large)
-### Model Sources
-* **Repository:** [https://huggingface.co/ZombitX64/Thai-sentiment-e5](https://huggingface.co/ZombitX64/ZombitX64/MultiSent-E5-Pro)
-* **Base Model:** [https://huggingface.co/intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large)
-## Uses
-### Direct Use
-This model can be directly used for sentiment analysis of Thai text. It's particularly useful for:
-* **Social Media Analysis**: Monitoring sentiment on Thai social platforms like Twitter, Facebook, and Pantip
-* **Customer Feedback Analysis**: Processing reviews and feedback in Thai for e-commerce and services
-* **Product Review Classification**: Automatically categorizing product reviews by sentiment
-* **Opinion Mining**: Extracting sentiment from Thai news articles, blogs, and forums
-* **Customer Service**: Categorizing customer inquiries and complaints by sentiment and intent
-### Downstream Use
-The model can be integrated into larger applications such as:
-* **Customer Service Chatbots**: Automatically routing messages based on sentiment
-* **Social Media Analytics Platforms**: Real-time sentiment monitoring dashboards
-* **E-commerce Review Systems**: Automated review scoring and categorization
-* **Content Moderation Systems**: Identifying potentially problematic content
-* **Market Research Tools**: Analyzing consumer sentiment towards brands or products
-* **News Analysis Systems**: Tracking public opinion on political or social issues
-### Out-of-Scope Use
-This model should not be used for:
-* **Question Classification**: The model has poor performance on question detection due to insufficient training data. Questions are often misclassified with moderate confidence (50-60%). Use a dedicated question classification model instead.
-* **Mixed Sentiment Analysis**: Complex texts with both positive and negative elements may be misclassified or produce low confidence scores. Consider using aspect-based sentiment analysis for such cases.
-* **Non-Thai Languages**: While it has some multilingual capability, accuracy is significantly lower for languages other than Thai
-* **Fine-grained Emotion Detection**: The model only classifies into 4 broad categories, not specific emotions like anger, joy, fear, etc.
-* **Clinical Applications**: Should not be used for mental health diagnosis or psychological assessment without proper validation
-* **High-stakes Decision Making**: Avoid using for critical decisions affecting individuals without human oversight, especially for predictions with confidence < 60%
-* **Legal or Financial Decisions**: The model's predictions should not be the sole basis for legal or financial determinations
-## 🌐 Multilingual Sentiment Capability
-The `MultiSent-E5` model has been developed as an extension of the `intfloat/multilingual-e5-large` base model, which is a multilingual embedding model supporting over 50 languages. This gives the model some capability for sentiment prediction in multiple languages beyond Thai.
-### Language Support Details
-* **Primary Language**: Thai - The model has been fine-tuned specifically for Thai and performs best with Thai text
-* **Secondary Languages**: The model can provide basic sentiment analysis for other languages such as English, Chinese, Japanese, Indonesian, and other languages supported by the base multilingual model
-* **Performance Considerations**: Accuracy for non-Thai languages may be significantly lower and results may be less reliable, depending on the similarity of linguistic structures and vocabulary to Thai
-### Multilingual Performance Expectations
-| Language Family | Expected Performance | Use Case Recommendation |
-|-----------------|---------------------|-------------------------|
-| Thai | Excellent (99%+ accuracy) | Primary use case |
-| Southeast Asian (Indonesian, Malay, Vietnamese) | Good (70-85% accuracy) | Limited use with validation |
-| East Asian (Chinese, Japanese, Korean) | Moderate (60-75% accuracy) | Experimental use only |
-| European Languages | Moderate (55-70% accuracy) | Not recommended |
-| Other Languages | Poor (40-60% accuracy) | Not recommended |
-### Recommendations for Multilingual Use
-* **Primary Recommendation**: Use this model primarily for Thai sentiment analysis where it excels
-* **Secondary Use**: For other languages, consider using language-specific models for maximum accuracy
-* **Validation Required**: Always validate results when using with non-Thai languages
-* **Experimental Use**: Multilingual capability can be useful for initial exploration or when Thai-specific models are unavailable
-This multilingual capability makes the model suitable for basic multilingual sentiment classification tasks while maintaining excellent performance for Thai text analysis.
-## How to Get Started with the Model
-### Basic Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
-# Load the model and tokenizer
-model_name = "ZombitX64/MultiSent-E5-Pro"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(model_name)
-# Example Thai text
-text = "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย"  # "This product is very good, easy to use"
-# Tokenize and predict
-inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
 with torch.no_grad():
     outputs = model(**inputs)
-    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-    predicted_class = torch.argmax(predictions, dim=-1)
-# Label mapping: 0=Question, 1=Negative, 2=Neutral, 3=Positive
 labels = ["Question", "Negative", "Neutral", "Positive"]
-predicted_label = labels[predicted_class.item()]
-confidence = predictions[0][predicted_class.item()].item()
-print(f"Text: {text}")
-print(f"Predicted sentiment: {predicted_label} ({confidence:.2%})")
-```
-### Batch Processing
-```python
-# List of texts to analyze (multilingual examples)
-texts = [
-    "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย",               # Thai: "This product is very good, easy to use"
-    "The service was terrible and disappointing",  # English
-    "商品质量还可以",                               # Chinese: "Product quality is okay"
-    "บริการแย่มาก ไม่ประทับใจเลย",                 # Thai: "Service is terrible, not impressed at all"
-    "Ce produit est excellent",                     # French: "This product is excellent"
-]
-print("Predicting sentiment for multiple texts:")
-for text in texts:
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
-    with torch.no_grad():
-        outputs = model(**inputs)
-        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-        predicted_class = torch.argmax(predictions, dim=-1)
-    predicted_label = labels[predicted_class.item()]
-    confidence = predictions[0][predicted_class.item()].item()
-    print(f"\nText: \"{text}\"")
-    print(f"Predicted sentiment: {predicted_label} ({confidence:.2%})")
-```
-### Pipeline Usage
-```python
-from transformers import pipeline
-# Create a sentiment analysis pipeline
-classifier = pipeline("text-classification",
-                     model="ZombitX64/MultiSent-E5-Pro",
-                     tokenizer="ZombitX64/MultiSent-E5-Pro")
-# Analyze sentiment
-texts = [
-    "วันนี้อากาศดีจังเลย",  # "The weather is so nice today"
-    "แย่ที่สุดเท่าที่เคยเจอมา"  # "The worst I've ever encountered"
-]
-results = classifier(texts)
-for text, result in zip(texts, results):
-    print(f"Text: {text}")
-    print(f"Sentiment: {result['label']} (Score: {result['score']:.4f})")
 ```
-## Training Details
-### Training Data
-The model was trained on a carefully curated Thai sentiment dataset with the following characteristics:
-* **Total samples:** 2,730 (2,729 after data cleaning and filtering)
-* **Data Distribution:**
-  - **Question samples:** Minimal representation (specific count not provided)
-  - **Negative samples:** 102 (3.7% of dataset)
-  - **Neutral samples:** 317 (11.6% of dataset)
-  - **Positive samples:** 2,310 (84.7% of dataset)
-**Data Split Strategy:**
-* **Training set:** 2,456 samples (90% of total data)
-* **Validation set:** 273 samples (10% of total data)
-**Data Quality and Preprocessing:**
-* Data was manually reviewed and cleaned to ensure quality
-* Duplicate entries were removed
-* Text was normalized for consistent formatting
-* Class imbalance was noted but maintained to reflect real-world distribution
-### Training Procedure
-The model was fine-tuned using state-of-the-art techniques with careful hyperparameter optimization:
-#### Training Hyperparameters
-* **Base Model:** intfloat/multilingual-e5-large (1.02B parameters)
-* **Model Architecture:** XLMRobertaForSequenceClassification
-* **Training Epochs:** 5 (with early stopping monitoring)
-* **Total Training Steps:** 770
-* **Batch Size:** 8 (effective batch size with gradient accumulation)
-* **Learning Rate:** 2e-5 with linear warmup and decay
-* **Weight Decay:** 0.01
-* **Warmup Steps:** 77 (10% of total steps)
-* **Max Sequence Length:** 512 tokens
-* **Optimization:** AdamW optimizer
-* **Training Runtime:** 1,633.3 seconds (~27 minutes)
-* **Training Samples per Second:** 7.519
-* **Training Steps per Second:** 0.471
-#### Training Infrastructure
-* **Hardware:** GPU-accelerated training (specific GPU not specified)
-* **Framework:** Hugging Face Transformers 4.x
-* **Distributed Training:** Single GPU setup
-* **Memory Optimization:** Gradient checkpointing enabled
-#### Training Results
-The model showed excellent convergence with minimal overfitting:
-| Epoch | Training Loss | Validation Loss | Accuracy | Notes |
-|-------|---------------|-----------------|----------|--------|
-| 1     | 0.0812        | 0.0699          | 98.53%   | Strong initial performance |
-| 2     | 0.0053        | 0.0527          | 99.27%   | Rapid improvement |
-| 3     | 0.0041        | 0.0350          | 99.63%   | Near-optimal performance |
-| 4     | 0.0002        | 0.0384          | 99.63%   | Slight validation loss increase |
-| 5     | 0.0002        | 0.0410          | 99.63%   | Stable performance |
-**Training Observations:**
-- Very low training loss achieved by epoch 3
-- Validation loss remained stable, indicating minimal overfitting
-- Accuracy plateaued at 99.63% from epoch 3 onwards
-- Early convergence suggests effective transfer learning from the base model
-============================================================
-Evaluating: ZombitX64/MultiSent-E5-Pro
-============================================================
-Loading ZombitX64/MultiSent-E5-Pro...
-Predicting 2183 samples...
-Predicting: 2183/2183
-Accuracy: 0.846
-F1-Macro: 0.846
-F1-Weighted: 0.847
-Avg Confidence: 0.985
-Low Confidence %: 1.0%
-Error Rate: 0.154
-Sample Errors:
-  '今天的表现无可挑剔' -> neutral (conf: 1.00) [True: positive]
-  '这真是个天才的想法，我简直佩服得五体投地' -> positive (conf: 1.00) [True: negative]
-  '你真是太能干了，把事情搞成这样' -> positive (conf: 1.00) [True: negative]
-  '这个项目真是太成功了，成功到一塌糊涂' -> positive (conf: 1.00) [True: negative]
-  '这饭菜做得真是太好吃了，我一点都吃不下' -> positive (conf: 1.00) [True: negative]
-============================================================
-BEST PERFORMING MODEL: ZombitX64/MultiSent-E5-Pro
-============================================================
-Per-Class Performance:
-          precision  recall  f1-score  support
-negative      0.910   0.846     0.877    661.0
-neutral       0.719   0.816     0.764    517.0
-positive      0.830   0.943     0.883    471.0
-question      0.944   0.790     0.860    534.0
-================================================================================
-COMPREHENSIVE MODEL COMPARISON REPORT
-Dataset: ZombitX64/Sentiment-Benchmark
-================================================================================
-Ranked by F1-Macro Score:
-                                           Model  Accuracy  F1-Macro  F1-Weighted  Avg_Confidence  Low_Conf_%  Error_Rate
-                      ZombitX64/MultiSent-E5-Pro    0.8461    0.8461       0.8475          0.9853      0.9620      0.1539
-                          ZombitX64/MultiSent-E5    0.8062    0.8062       0.8072          0.9708      1.6033      0.1938
-                         ZombitX64/sentiment-103    0.5740    0.4987       0.5020          0.9647      2.2446      0.4260
-                          ZombitX64/Sentiment-03    0.4828    0.4906       0.4856          0.9609      2.7485      0.5172
-                          ZombitX64/Sentiment-02    0.4137    0.3884       0.3910          0.8151     10.0779      0.5863
-                     ZombitX64/Thai-sentiment-e5    0.4961    0.3713       0.3704          0.9874      0.8246      0.5039
-nlptown/bert-base-multilingual-uncased-sentiment    0.3587    0.2870       0.2896          0.4103     87.9066      0.6413
-                          ZombitX64/Sentiment-01    0.2712    0.1928       0.1894          0.5085     94.5946      0.7288
-            SandboxBhh/sentiment-thai-text-model    0.2620    0.1807       0.1982          0.8610     20.2016      0.7380
-   Thaweewat/wangchanberta-hyperopt-sentiment-01    0.2336    0.1501       0.1655          0.9128      2.9776      0.7664
-     phoner45/wangchan-sentiment-thai-text-model    0.2203    0.1073       0.1270          0.7123     41.7316      0.7797
-      poom-sci/WangchanBERTa-finetuned-sentiment    0.2093    0.1061       0.1246          0.7889     14.7045      0.7907
-   cardiffnlp/twitter-xlm-roberta-base-sentiment    0.0944    0.0848       0.0841          0.6897     32.2492      0.9056
-### Testing Data, Factors & Metrics
-#### Testing Data
-The model was evaluated on a carefully selected validation set with the following characteristics:
-* **Total Samples:** 2183
-* **Selection Method:** Stratified random sampling to maintain class distribution
-* **Data Quality:** Manually verified and cleaned validation samples
-* **Evaluation Period:** Final model checkpoint from epoch 5
-#### Evaluation Metrics
-The model was comprehensively evaluated using multiple metrics:
-* **Primary Metrics:**
-  - **Accuracy:** Overall classification accuracy across all classes
-  - **F1-Score:** Both macro and weighted averages
-* **Secondary Metrics:**
-  - **Precision:** Per-class and overall precision scores
-  - **Recall:** Per-class and overall recall scores
-  - **Support:** Number of samples per class in validation set
-#### Known Limitations
-**1. Question Class Performance Issues:**
-- **Insufficient Training Data**: The question class has minimal representation in the training dataset
-- **Low Confidence Predictions**: Question classification often results in confidence scores below 60%
-- **Misclassification**: Questions are frequently classified as positive, negative, or neutral instead
-- **Example Issue**: "ลำไยอร่อยดีสดมากและลูกใหญ่ด้วยแต่เน่าไปครึ่งนึ..." (Longans are delicious and fresh, big fruits too, but half are rotten...) → Classified as neutral (97.7% confidence) instead of recognizing mixed sentiment
-**2. Mixed Sentiment Challenges:**
-- **Complex Sentiment**: Texts with both positive and negative aspects may be misclassified
-- **Moderate Confidence**: Mixed sentiment often results in lower confidence scores (50-60%)
-- **Example**: Product reviews mentioning both good and bad aspects tend toward neutral classification
-**3. Class Imbalance Effects:**
-- Model may be biased toward positive classifications due to training data imbalance (84.7% positive samples)
-- Neutral class performance slightly lower due to limited training examples (11.6% of data)
-- Negative class well-represented but still only 3.7% of training data
-**4. Low Confidence Predictions:**
-- Predictions with confidence < 60% should be treated with caution
-- Common in mixed sentiment, ambiguous language, or question-like texts
-- Recommend implementing confidence thresholding for production use
-## Environmental Impact
-### Carbon Footprint Considerations
-* **Training Emissions:** Specific carbon emission data not available
-* **Efficiency Benefits:** Model was fine-tuned from a pre-trained multilingual model, significantly reducing computational cost compared to training from scratch
-* **Resource Usage:** Relatively efficient training with only 27 minutes of GPU time required
-* **Deployment Efficiency:** Model can be deployed efficiently for inference with standard hardware
-### Sustainable AI Practices
-* **Transfer Learning:** Leveraged existing multilingual model to reduce training requirements
-* **Efficient Architecture:** Uses proven transformer architecture optimized for efficiency
-* **Reusability:** Single model can handle multiple languages, reducing need for separate models
-## Technical Specifications
-### Model Architecture and Objective
-* **Architecture:** XLMRobertaForSequenceClassification
-* **Base Model:** intfloat/multilingual-e5-large
-* **Model Parameters:** ~1.02 billion parameters
-* **Classification Head:** Linear layer with 4 output classes
-* **Task:** Multi-class text classification (4 classes: Question, Negative, Neutral, Positive)
-* **Objective Function:** Cross-entropy loss minimization
-* **Activation Function:** Softmax for final predictions
-* **Input Processing:** Tokenization with XLM-RoBERTa tokenizer
-* **Maximum Input Length:** 512 tokens
-### Performance Characteristics
-* **Inference Speed:** Fast inference suitable for real-time applications
-* **Memory Requirements:** Standard transformer model memory usage
-* **Scalability:** Can handle batch processing efficiently
-* **Hardware Requirements:** Compatible with CPU and GPU inference
-### Integration Specifications
-* **Framework Compatibility:**
-  - Hugging Face Transformers
-  - PyTorch
-  - ONNX (convertible)
-  - TensorFlow (via conversion)
-* **API Support:** Compatible with Hugging Face Inference API
-* **Deployment Options:**
-  - Cloud deployment (AWS, GCP, Azure)
-  - Edge deployment (with optimization)
-  - Local deployment
-## Compute Infrastructure
-### Hardware Requirements
-#### Training Infrastructure
-* **GPU:** Modern NVIDIA GPU with sufficient VRAM (16GB+ recommended)
-* **Memory:** 32GB+ RAM recommended for training
-* **Storage:** SSD storage for fast data loading
-#### Inference Infrastructure
-* **Minimum Requirements:**
-  - CPU: Modern multi-core processor
-  - RAM: 8GB+ for batch processing
-  - Storage: 2GB for model files
-* **Recommended for Production:**
-  - GPU: NVIDIA T4 or better
-  - RAM: 16GB+
-  - Multiple instances for load balancing
-### Software Dependencies
-#### Core Requirements
-* **Python:** 3.8+
-* **PyTorch:** 1.9+
-* **Transformers:** 4.15+
-* **NumPy:** 1.21+
-* **Tokenizers:** 0.11+
-#### Optional Dependencies
-* **ONNX:** For model conversion and optimization
-* **TensorRT:** For NVIDIA GPU optimization
-* **Gradio/Streamlit:** For web interface development
-## Usage Examples and Best Practices
-### Best Practices for Implementation
-## Citation
-### Academic Citation
-**BibTeX:**
 ```bibtex
-@misc{MultiSent-E5-Pro,
-  title={Thai-sentiment-e5: A Fine-tuned Multilingual Sentiment Analysis Model for Thai Text Classification},
-  author={ZombitX64 and Janutsaha, Krittanut and Saengwichain, Chanyut},
   year={2024},
   url={https://huggingface.co/ZombitX64/MultiSent-E5-Pro},
-  note={Hugging Face Model Repository}
 }
 ```
-### Usage in Publications
-If you use this model in your research or applications, please cite both this model and the base model:
-```bibtex
-@article{wang2024multilingual,
-  title={Multilingual E5 Text Embeddings: A Technical Report},
-  author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu},
-  journal={arXiv preprint arXiv:2402.05672},
-  year={2024}
-}
-```
-## Model Card Authors
-**Primary Contributors:**
-- **ZombitX64** - Lead developer and model architect
-- **Krittanut Janutsaha** - Data curation and evaluation
-- **Chanyut Saengwichain** - Model optimization and documentation
-## Model Card Contact
-### Support and Issues
-For questions, issues, or contributions regarding this model, please use the following channels:
-* **Primary Contact:** Hugging Face model repository issues and discussions
-* **Repository:** [https://huggingface.co/ZombitX64/MultiSent-E5-Pro](https://huggingface.co/ZombitX64/MultiSent-E5-Pro)
-* **Community:** Hugging Face community forums for general questions
-### Collaboration Opportunities
-We welcome collaboration on:
-- Improving the model's performance
-- Expanding to other Southeast Asian languages
-- Creating domain-specific variants
-- Integration into larger NLP systems
-### Feedback and Improvements
-Your feedback helps improve this model. Please report:
-- Performance issues on specific text types
-- Suggestions for additional evaluation metrics
-- Use cases where the model performs unexpectedly
-- Ideas for model enhancements
 ---
-*Last updated: 2024*
-*Model version: 1.1*
-*Documentation version: 2.0*

 ---
 license: cc-by-nc-nd-4.0
 language:
 - th
 - xh
 - yi
 - zh
+base_model: intfloat/multilingual-e5-large
 library_name: transformers
 pipeline_tag: text-classification
 tags:
+  - sentiment-analysis
+  - thai
+  - multilingual
+  - fine-tuned
+  - transformers
+  - southeast-asian
 datasets:
+  - ZombitX64/SEACrowdWongnaiReviews
+  - ZombitX64/Sentiment-Benchmark
+metrics:
+  - accuracy
+  - f1
+  - precision
+  - recall
+widget:
+  - text: "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย"
+    example_title: "Thai Positive"
+  - text: "บริการแย่มาก ไม่ประทับใจเลย"
+    example_title: "Thai Negative"
+  - text: "อาหารรสชาติธรรมดา"
+    example_title: "Thai Neutral"
+  - text: "ราคาเท่าไหร่ครับ?"
+    example_title: "Thai Question"
+---
+# 🎯 MultiSent-E5-Pro: Advanced Thai Sentiment Classifier
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/673eef9c4edfc6d3b58ba3aa/Gl94xasTswsG1cOjR_076.png" width="300" alt="MultiSent-E5-Pro Logo">
+<strong>🇹🇭 State-of-the-art Thai sentiment analysis with multilingual capabilities</strong>
+<a href="https://creativecommons.org/licenses/by-nc-nd/4.0/"><img src="https://img.shields.io/badge/License-CC_BY--NC--ND_4.0-lightgrey.svg"></a> <a href="https://huggingface.co/ZombitX64/MultiSent-E5-Pro"><img src="https://img.shields.io/badge/🤗%20HF-Model-yellow"></a> <a href="https://huggingface.co/ZombitX64/MultiSent-E5-Pro"><img src="https://img.shields.io/badge/Downloads-1K+-green"></a>
+</div>
+## 📋 Quick Overview
+**MultiSent-E5-Pro** is a fine-tuned sentiment analysis model based on `intfloat/multilingual-e5-large`, specially optimized for Thai with support for multilingual contexts. The model classifies text into four categories: **Positive**, **Negative**, **Neutral**, and **Question**.
+### 🎯 Key Features
+* Handles **Thai-specific expressions**, **colloquialisms**, and **sarcasm** effectively
+* Performs well on **real-world social media & review data**
+* **Multilingual support** for Southeast and East Asian languages
+---
+## 🏆 Benchmark Summary
+| Rank   | Model            | Accuracy   | F1-Macro   | Notes             |
+| ------ | ---------------- | ---------- | ---------- | ----------------- |
+| 🥇 1st | MultiSent-E5-Pro | **84.61%** | **84.61%** | Best overall      |
+| 2nd    | MultiSent-E5     | 80.62%     | 80.62%     | Baseline model    |
+| 3rd    | sentiment-103    | 57.40%     | 49.87%     | Moderate baseline |
+---
+## 📊 Detailed Metrics (2,183 samples)
+| Metric                     | Score  |
+| -------------------------- | ------ |
+| Accuracy                   | 84.61% |
+| F1-Macro                   | 84.61% |
+| F1-Weighted                | 84.75% |
+| Avg Confidence             | 98.53% |
+| Low Confidence Rate (<60%) | 0.96%  |
+### Per-Class Performance
+| Class    | Precision | Recall | F1    | Notes     |
+| -------- | --------- | ------ | ----- | --------- |
+| Negative | 91.0%     | 84.6%  | 87.7% | Excellent |
+| Positive | 83.0%     | 94.3%  | 88.3% | Excellent |
+| Neutral  | 71.9%     | 81.6%  | 76.4% | Moderate  |
+| Question | 94.4%     | 79.0%  | 86.0% | Good      |
+---
+## 🌍 Language Support
+| Region    | Languages  | Performance  |
+| --------- | ---------- | ------------ |
+| Thai      | Thai       | 🟢 Excellent |
+| SEA       | ID, VI, MS | 🟡 Good      |
+| East Asia | ZH, JA, KO | 🟠 Moderate  |
+| Europe    | EN, ES, FR | 🔴 Low       |
+---
+## ⚡ Quick Start
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
+model = "ZombitX64/MultiSent-E5-Pro"
+tokenizer = AutoTokenizer.from_pretrained(model)
+model = AutoModelForSequenceClassification.from_pretrained(model)
+text = "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย"
+inputs = tokenizer(text, return_tensors="pt", truncation=True)
 with torch.no_grad():
     outputs = model(**inputs)
+    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted = torch.argmax(probs, dim=-1)
 labels = ["Question", "Negative", "Neutral", "Positive"]
+print(f"Sentiment: {labels[predicted.item()]} (Confidence: {probs[0][predicted].item():.2%})")
 ```
+---
+## 🌟 Use Cases
+| Application        | Suitability  |
+| ------------------ | ------------ |
+| Product Reviews    | 🟢 Excellent |
+| Social Media       | 🟢 Excellent |
+| Customer Support   | 🟢 Excellent |
+| Content Moderation | 🟡 Good      |
+| Research Analysis  | 🟡 Good      |
+---
+## ⚠ Known Limitations
+* **Sarcasm Misclassification** (especially in Chinese)
+* **Mixed Sentiments** lean toward Neutral
+* **Low recall** for **Question** class due to limited data
+* **Bias toward Positive** due to class imbalance
+* **Overconfidence** in some multilingual predictions
+---
+## 🛠 Technical Info
+| Config        | Value                 |
+| ------------- | --------------------- |
+| Base Model    | multilingual-e5-large |
+| Params        | \~1.02B               |
+| Classes       | 4                     |
+| Max Length    | 512                   |
+| Training Time | \~27 min              |
+**Data Summary**:
+* Training: 2,456 samples
+* Validation: 273 samples
+* Evaluation: 2,183 samples
+---
+## 📄 Citation
 ```bibtex
+@misc{MultiSent-E5-Pro-2024,
+  title={MultiSent-E5-Pro: Advanced Thai Sentiment Analysis},
+  author={ZombitX64, Janutsaha K., Saengwichain C.},
   year={2024},
   url={https://huggingface.co/ZombitX64/MultiSent-E5-Pro},
+  note={Hugging Face Model Card}
 }
 ```
+---
+## 👨‍💼 Authors
+| Role           | Name                 |
+| -------------- | -------------------- |
+| Lead Dev       | ZombitX64            |
+| Data Scientist | Krittanut Janutsaha  |
+| Engineer       | Chanyut Saengwichain |
+---
+## 😊 Feedback & Contributions
+* 💬 [Open Discussion](https://huggingface.co/ZombitX64/MultiSent-E5-Pro/discussions)
+* 🐛 [Report Issue](https://huggingface.co/ZombitX64/MultiSent-E5-Pro/issues)
+* 🌟 Star the repo if useful!
 ---
+<div align="center">
+Last Updated: Dec 2024 | Version: 1.1 | Docs: v2.0
+</div>