tykea
/

khmer-fasttext-sentiment-analysis

Text Classification

Model card Files Files and versions

tykea commited on Dec 18, 2024

Commit

865e91b

·

verified ·

1 Parent(s): d6c765c

Update README.md

Files changed (1) hide show

README.md +62 -3

README.md CHANGED Viewed

@@ -1,3 +1,62 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- km
+metrics:
+- accuracy
+base_model:
+- facebook/fasttext-km-vectors
+pipeline_tag: text-classification
+library_name: fasttext
+---
+**This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.**
+- **Task**: Sentiment analysis (binary classification).
+- **Languages Supported**: Khmer.
+- **Intended Use Cases**:
+  - Analyzing customer reviews.
+  - Social media sentiment detection.
+- **Limitations**:
+  - Performance may degrade on languages or domains not present in the training data.
+  - Does not handle sarcasm or highly ambiguous inputs well.
+  -
+The model was evaluated on a test set of 400 samples, achieving the following performance:
+- **Test Accuracy**: 81%
+- **Precision**: 81%
+- **Recall**: 81%
+- **F1 Score**: 81%
+Confusion Matrix:
+| Predicted\Actual | Negative | Positive |
+|-------------------|----------|----------|
+| **Negative**      | 165      | 44       |
+| **Positive**      | 31       | 160      |
+The model supports a maximum sequence length of 512 tokens.
+## How to Use
+```python
+import fasttext
+from khmernltk import word_tokenize
+# Load the model
+model = fasttext.load_model('/Users/tykea/Desktop/fasttext-finetuned/sentiment_model.ftz')
+def predict(text):
+    # Tokenize the text
+    tokens = word_tokenize(text)
+    # Join tokens back into a single string
+    tokenized_text = ' '.join(tokens)
+    # Make predictions
+    predictions = model.predict(tokenized_text)
+    # Map labels to human-readable format
+    label_mapping = {
+        '__label__0': 'negative',
+        '__label__1': 'positive'
+    }
+    # Get the predicted label
+    predicted_label = predictions[0][0]
+    # Map the predicted label
+    human_readable_label = label_mapping.get(predicted_label, 'unknown')
+    return human_readable_label
+predict('នេះគីជាល្បះអវិជ្ជមានសម្រាប់ប្រជាជនខ្មែរ')