File size: 3,243 Bytes
5e50bc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef7672
5e50bc7
6ef7672
5e50bc7
6ef7672
5e50bc7
6ef7672
5e50bc7
6ef7672
5c6c0c4
 
 
 
 
 
 
 
 
99ff266
5c6c0c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f79ae3
5c6c0c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
datasets:
- kornwtp/indonlu-smsa
language:
- id
metrics:
- accuracy
base_model:
- indobenchmark/indobert-base-p1
pipeline_tag: text-classification
---

# Indonesian Text Sentiment Analysis πŸš€
## πŸ“Œ Overview  
This project fine-tunes a **transformer-based model** to analyze sentiment for Indonesian text.  

## πŸ“₯ Data Collection  
The dataset used for fine-tuning was sourced from **IndoNLU Datasets**, specifically:  
[SmSA (IndoNLU) Dataset](https://metatext.io/datasets/smsa-(indonlu))

## πŸ”„ Data Preparation  
- **Tokenization**:
  - Used **Indobert** for efficient text processing.
- **Train-Test Split**:
  - The Dataset is already splitted into train, validation, and test.

## πŸ‹οΈ Fine-Tuning & Results  
The model was fine-tuned using **TensorFlow Hugging Face Transformers**.

### **πŸ“Š Evaluation Metrics**
| **Epoch** | **Train Loss** | **Train Accuracy** | **Eval Loss** | **Eval Accuracy** | **Training Time** | **Validation Time** |
|-----------|----------------|---------------------|---------------|-------------------|-------------------|---------------------|
| **1**     | `0.2471`       | `88.15%`           | `0.2107`      | `91.31%`          | `7:55 min`        | `10 sec`            |
| **2**     | `0.1844`       | `90.41%`           | `0.2107`      | `92.39%`          | `7:50 min`        | `10 sec`            |
| **3**     | `0.1502`       | `91.66%`           | `0.2135`      | `93.14%`          | `7:51 min`        | `9 sec`             |
| **4**     | `0.1285`       | `92.50%`           | `0.2192`      | `93.69%`          | `7:50 min`        | `10 sec`            |
| **5**     | `0.1101`       | `93.13%`           | `0.2367`      | `94.14%`          | `7:48 min`        | `9 sec`             |

## βš™οΈ Training Parameters  
epochs = 5

learning_rate = 5e-5

seed_val = 42

max_length = 128

batch_size = 32

eval_batch_size = 32

## πŸ€– How to use

```python
import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

# Load model dan tokenizer
model_name = "feverlash/Indonesian-SentimentAnalysis-Model"  # Ganti dengan path model yang telah disimpan
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

# Fungsi untuk melakukan prediksi sentimen
def predict(text):
    sentiment_mapping = {
        1: "positive",
        0: "negative",
        2: "neutral"
    }
    
    # Tokenisasi teks
    inputs = tokenizer(
        text,
        return_tensors="tf",
        truncation=True,
        padding="max_length",
        max_length=128
    )
    
    # Prediksi menggunakan model
    outputs = model(inputs)
    logits = outputs.logits

    # Menghitung probabilitas
    probabilities = tf.nn.softmax(logits).numpy()
    
    # Menentukan label prediksi
    predicted_index = int(tf.argmax(probabilities, axis=1).numpy()[0])
    predicted_label = sentiment_mapping.get(predicted_index, "unknown")
    
    # Keyakinan prediksi
    confidence = probabilities[0][predicted_index]

    print(f"Teks: {text}")
    print(f"Prediksi label: {predicted_label} (Confidence: {confidence:.2f})")

# Contoh penggunaan
text = "aku sedang jalan-jalan di Yogyakarta"
predict(text)