File size: 3,243 Bytes
5e50bc7 6ef7672 5e50bc7 6ef7672 5e50bc7 6ef7672 5e50bc7 6ef7672 5e50bc7 6ef7672 5c6c0c4 99ff266 5c6c0c4 6f79ae3 5c6c0c4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
datasets:
- kornwtp/indonlu-smsa
language:
- id
metrics:
- accuracy
base_model:
- indobenchmark/indobert-base-p1
pipeline_tag: text-classification
---
# Indonesian Text Sentiment Analysis π
## π Overview
This project fine-tunes a **transformer-based model** to analyze sentiment for Indonesian text.
## π₯ Data Collection
The dataset used for fine-tuning was sourced from **IndoNLU Datasets**, specifically:
[SmSA (IndoNLU) Dataset](https://metatext.io/datasets/smsa-(indonlu))
## π Data Preparation
- **Tokenization**:
- Used **Indobert** for efficient text processing.
- **Train-Test Split**:
- The Dataset is already splitted into train, validation, and test.
## ποΈ Fine-Tuning & Results
The model was fine-tuned using **TensorFlow Hugging Face Transformers**.
### **π Evaluation Metrics**
| **Epoch** | **Train Loss** | **Train Accuracy** | **Eval Loss** | **Eval Accuracy** | **Training Time** | **Validation Time** |
|-----------|----------------|---------------------|---------------|-------------------|-------------------|---------------------|
| **1** | `0.2471` | `88.15%` | `0.2107` | `91.31%` | `7:55 min` | `10 sec` |
| **2** | `0.1844` | `90.41%` | `0.2107` | `92.39%` | `7:50 min` | `10 sec` |
| **3** | `0.1502` | `91.66%` | `0.2135` | `93.14%` | `7:51 min` | `9 sec` |
| **4** | `0.1285` | `92.50%` | `0.2192` | `93.69%` | `7:50 min` | `10 sec` |
| **5** | `0.1101` | `93.13%` | `0.2367` | `94.14%` | `7:48 min` | `9 sec` |
## βοΈ Training Parameters
epochs = 5
learning_rate = 5e-5
seed_val = 42
max_length = 128
batch_size = 32
eval_batch_size = 32
## π€ How to use
```python
import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
# Load model dan tokenizer
model_name = "feverlash/Indonesian-SentimentAnalysis-Model" # Ganti dengan path model yang telah disimpan
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
# Fungsi untuk melakukan prediksi sentimen
def predict(text):
sentiment_mapping = {
1: "positive",
0: "negative",
2: "neutral"
}
# Tokenisasi teks
inputs = tokenizer(
text,
return_tensors="tf",
truncation=True,
padding="max_length",
max_length=128
)
# Prediksi menggunakan model
outputs = model(inputs)
logits = outputs.logits
# Menghitung probabilitas
probabilities = tf.nn.softmax(logits).numpy()
# Menentukan label prediksi
predicted_index = int(tf.argmax(probabilities, axis=1).numpy()[0])
predicted_label = sentiment_mapping.get(predicted_index, "unknown")
# Keyakinan prediksi
confidence = probabilities[0][predicted_index]
print(f"Teks: {text}")
print(f"Prediksi label: {predicted_label} (Confidence: {confidence:.2f})")
# Contoh penggunaan
text = "aku sedang jalan-jalan di Yogyakarta"
predict(text) |