bimobirra commited on
Commit
0f446d7
·
verified ·
1 Parent(s): cd04afa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - min
4
+ - id
5
+ - en
6
+ license: mit
7
+ library_name: transformers
8
+ tags:
9
+ - xlm-roberta
10
+ - sentiment-analysis
11
+ - code-mixing
12
+ - explainable-ai
13
+ - xai
14
+ - low-resource-language
15
+ datasets:
16
+ - custom-minangkabau-youtube-comments
17
+ metrics:
18
+ - f1
19
+ - accuracy
20
+ ---
21
+
22
+ # Explainable XLM-R for Code-Mixed Minangkabau Sentiment Analysis
23
+
24
+ ## Model Description
25
+ This model is a fine-tuned version of **XLM-RoBERTa (XLM-R)** specifically designed to handle the complexities of **Code-Mixed Minangkabau** sentiment analysis. Social media data in Indonesia, particularly from regional areas like West Sumatra, often features a heavy mix of Minangkabau, Indonesian, and English.
26
+
27
+ Traditional models often struggle with this "low-resource gap" and "non-standard orthography." This model bridge those gaps while prioritizing **Explainability (XAI)** to demystify the "black-box" nature of Deep Learning.
28
+
29
+ ### Key Features:
30
+ - **Multilingual Support:** Optimized for Indonesian, Minangkabau, and English code-mixing.
31
+ - **Robustness:** Handles informal spelling, unique affixations, and phonetic typing common in YouTube comments.
32
+ - **XAI-Ready:** Designed to be interpreted using feature attribution methods like **SHAP** or **LIME** to provide local and global explanations.
33
+
34
+ ## Intended Uses & Limitations
35
+ - **Primary Use:** Sentiment classification (Positive, Neutral, Negative) for regional Indonesian languages.
36
+ - **Limitations:** Performance might vary on purely formal Minang literature as the training data is derived from social media (YouTube) contexts.
37
+
38
+ ## How to Use
39
+ You can use this model directly with the `transformers` library:
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
43
+ import torch
44
+
45
+ model_name = "bimobirra/explainable-xlmr-code-mixed-low-resource-lang"
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
48
+
49
+ text = "Rancak bana videonya, tapi agak sedikit nge-lag loadingnya."
50
+ inputs = tokenizer(text, return_tensors="pt")
51
+
52
+ with torch.no_grad():
53
+ logits = model(**inputs).logits
54
+
55
+ predicted_class_id = logits.argmax().item()
56
+ print(f"Predicted class ID: {predicted_class_id}")