Instructions to use teguholix/BERTAUT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use teguholix/BERTAUT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="teguholix/BERTAUT")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("teguholix/BERTAUT") model = AutoModelForSequenceClassification.from_pretrained("teguholix/BERTAUT") - Notebooks
- Google Colab
- Kaggle
- BERTAUT v1
- Versi Bahasa Indonesia
BERTAUT v1
English Version
BERTAUT v1 is a BERT-based text classification model for early UTAUT-related category detection and text filtering. The model classifies user-generated text, user reviews, open-ended questionnaire responses, public comments, and other unstructured technology-related text.
Created by Teguh Arie Sandy.
Model Information
| Item | Value |
|---|---|
| Model name | BERTAUT v1 |
| Base model | google-bert/bert-base-multilingual-cased |
| Hugging Face repository | teguholix/BERTAUT |
| Revision | main |
| Task | Text classification |
| Number of labels | 6 |
Labels
| Label | Meaning |
|---|---|
| PE | Performance Expectancy |
| EE | Effort Expectancy |
| SI | Social Influence |
| FC | Facilitating Conditions |
| GR | Generic Review |
| SP | Spam |
Label Definitions
PE: Performance Expectancy
Text expressing perceived usefulness, benefits, productivity, effectiveness, or performance improvement gained from technology use.
Example: This application helps me complete my tasks faster.
EE: Effort Expectancy
Text expressing ease of use, difficulty, simplicity, complexity, or the effort required to use a technology.
Example: The application is easy to use and the menu is clear.
SI: Social Influence
Text expressing influence from friends, family, lecturers, colleagues, communities, ratings, reviews, public figures, or other users.
Example: I use this application because my friend recommended it.
FC: Facilitating Conditions
Text expressing supporting conditions such as internet access, devices, system support, infrastructure, login access, server quality, or technical assistance.
Example: The application works well because my internet connection is stable.
GR: Generic Review
Short and general review text that does not clearly indicate PE, EE, SI, or FC.
Example: Good.
SP: Spam
Spam, promotional text, advertisements, irrelevant links, gambling content, or unrelated commercial messages.
Example: Register now and get a big bonus today.
Intended Use
This model can support:
- Early classification of technology acceptance text in Indonesian and other languages covered by mBERT, subject to validation on the target domain.
- Analysis of user reviews and public comments.
- Categorization of open-ended questionnaire responses.
- Automated text labeling before manual review.
- Identification of generic reviews and spam content.
Training Data
This seed version was trained using an early balanced BERTAUT dataset with six categories:
- PE
- EE
- SI
- FC
- GR
- SP
Training Procedure
Epoch-Level Training Log
| Epoch | Training Loss | Validation Loss | Accuracy | Macro Precision | Macro Recall | Macro F1 |
|---|---|---|---|---|---|---|
| 1 | 0.965550 | 0.371673 | 0.873940 | 0.880794 | 0.873952 | 0.873703 |
| 2 | 0.559251 | 0.317058 | 0.891209 | 0.895631 | 0.891218 | 0.891737 |
| 3 | 0.413123 | 0.429413 | 0.889953 | 0.894924 | 0.889946 | 0.890115 |
| 4 | 0.334815 | 0.425564 | 0.907064 | 0.908595 | 0.907070 | 0.907074 |
| 5 | 0.260256 | 0.609765 | 0.895290 | 0.900152 | 0.895304 | 0.894636 |
| 6 | 0.185834 | 0.636706 | 0.903925 | 0.905492 | 0.903926 | 0.904009 |
| 7 | 0.134153 | 0.654471 | 0.911460 | 0.912466 | 0.911461 | 0.911354 |
| 8 | 0.084147 | 0.662096 | 0.911931 | 0.913211 | 0.911935 | 0.912069 |
| 9 | 0.052045 | 0.714938 | 0.913187 | 0.914789 | 0.913191 | 0.913163 |
| 10 | 0.030871 | 0.719515 | 0.915071 | 0.916273 | 0.915074 | 0.915030 |
Training Summary
| Item | Value |
|---|---|
| Total training epochs | 10 |
| Highest validation accuracy | 0.915071 |
| Epoch with highest validation accuracy | 10 |
| Highest validation Macro F1 | 0.915030 |
| Epoch with highest validation Macro F1 | 10 |
| Lowest validation loss | 0.317058 |
| Epoch with lowest validation loss | 2 |
| Validation accuracy at final epoch | 0.915071 |
The highest validation accuracy and Macro F1-score were observed at epoch 10. The lowest validation loss was observed at epoch 2. Training logs and final evaluation metrics represent different stages and should not be interpreted as the same measurement.
Final Holdout Test Evaluation
The following metrics were calculated on a separate holdout test set containing 6,370 texts.
| Metric | Score |
|---|---|
| Accuracy | 0.9132 |
| Macro Precision | 0.9145 |
| Macro Recall | 0.9132 |
| Macro F1-score | 0.9131 |
| Weighted F1-score | 0.9131 |
Per-Label Test Results
| Label | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| PE | 0.90 | 0.94 | 0.92 | 1,062 |
| EE | 0.91 | 0.92 | 0.92 | 1,061 |
| SI | 0.88 | 0.92 | 0.90 | 1,062 |
| FC | 0.89 | 0.94 | 0.92 | 1,062 |
| GR | 0.96 | 0.92 | 0.94 | 1,062 |
| SP | 0.94 | 0.84 | 0.89 | 1,061 |
Confusion Matrix
Rows represent true labels and columns represent predicted labels.
| True label \ Predicted label | PE | EE | SI | FC | GR | SP |
|---|---|---|---|---|---|---|
| PE | 995 | 25 | 31 | 3 | 5 | 3 |
| EE | 32 | 977 | 18 | 21 | 7 | 6 |
| SI | 32 | 7 | 978 | 39 | 0 | 6 |
| FC | 4 | 13 | 33 | 998 | 2 | 12 |
| GR | 25 | 22 | 3 | 6 | 975 | 31 |
| SP | 18 | 26 | 47 | 49 | 27 | 894 |
How to Use
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
MODEL_ID = "teguholix/BERTAUT"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_ID,
revision="main"
)
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_ID,
revision="main"
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()
texts = [
"This application helps me complete my tasks faster.",
"The menu is difficult to understand.",
"I use this application because my friend recommended it."
]
encoded = tokenizer(
texts,
return_tensors="pt",
padding=True,
truncation=True,
max_length=128
)
encoded = {
key: value.to(device)
for key, value in encoded.items()
}
with torch.inference_mode():
logits = model(**encoded).logits
probabilities = torch.softmax(logits, dim=-1)
predicted_ids = probabilities.argmax(dim=1).tolist()
predicted_scores = probabilities.max(dim=1).values.tolist()
for text, label_id, score in zip(
texts,
predicted_ids,
predicted_scores
):
label = model.config.id2label[label_id]
print("Text :", text)
print("Label:", label)
print("Score:", round(score, 4))
print("-" * 50)
Google Colab Use Case
An end-to-end Google Colab notebook is available for automated BERTAUT labeling.
The notebook supports:
- CSV and Excel upload.
- Manual selection of the text column.
- Batch prediction with progress bar.
- Confidence score extraction.
- Label distribution analysis.
- Confidence visualization.
- Wordcloud visualization by label.
- ZIP download and Google Drive export.
Open the BERTAUT Google Colab use case
Limitations
This is a seed model. The training dataset is still limited and may be improved with additional independently labeled data in future versions. Model performance can vary across languages and domains that differ from the training data.
The BI, Behavioral Intention, label is not included in this v1 release because the seed dataset does not yet contain sufficient BI examples.
Future development may include:
- BI: Behavioral Intention.
- AU: Actual Use.
- Multi-label classification.
- Sentiment-aware UTAUT classification.
- Larger balanced datasets.
- More multilingual training data.
Attribution
BERTAUT v1 was created by Teguh Arie Sandy.
Versi Bahasa Indonesia
BERTAUT v1 adalah model klasifikasi teks berbasis BERT untuk mendeteksi kategori awal UTAUT dan menyaring jenis teks tertentu. Model ini mengklasifikasikan ulasan pengguna, komentar publik, respons kuesioner terbuka, dan teks tidak terstruktur lain yang berkaitan dengan penggunaan teknologi.
Dibuat oleh Teguh Arie Sandy.
Informasi Model
| Item | Nilai |
|---|---|
| Nama model | BERTAUT v1 |
| Model dasar | google-bert/bert-base-multilingual-cased |
| Repositori Hugging Face | teguholix/BERTAUT |
| Revision | main |
| Tugas | Klasifikasi teks |
| Jumlah label | 6 |
Label
| Label | Makna |
|---|---|
| PE | Performance Expectancy |
| EE | Effort Expectancy |
| SI | Social Influence |
| FC | Facilitating Conditions |
| GR | Generic Review |
| SP | Spam |
Definisi Label
PE: Performance Expectancy
Teks yang menunjukkan manfaat, kegunaan, produktivitas, efektivitas, atau peningkatan kinerja setelah menggunakan teknologi.
Contoh: Aplikasi ini membantu saya menyelesaikan tugas lebih cepat.
EE: Effort Expectancy
Teks yang menunjukkan kemudahan, kesulitan, kesederhanaan, kerumitan, atau usaha pengguna saat menggunakan teknologi.
Contoh: Aplikasi ini mudah digunakan dan menunya jelas.
SI: Social Influence
Teks yang menunjukkan pengaruh dari teman, keluarga, dosen, rekan kerja, komunitas, ulasan, rating, tokoh publik, atau pengguna lain.
Contoh: Saya menggunakan aplikasi ini karena direkomendasikan teman.
FC: Facilitating Conditions
Teks yang menunjukkan kondisi pendukung, seperti akses internet, perangkat, dukungan sistem, infrastruktur, akses login, kualitas server, atau bantuan teknis.
Contoh: Aplikasi ini berjalan lancar karena jaringan internet saya stabil.
GR: Generic Review
Teks ulasan umum yang singkat dan tidak secara jelas mengarah ke PE, EE, SI, atau FC.
Contoh: Bagus.
SP: Spam
Teks spam, promosi, iklan, tautan tidak relevan, judi online, atau pesan komersial yang tidak berkaitan dengan ulasan teknologi.
Contoh: Daftar sekarang dan dapatkan bonus besar hari ini.
Tujuan Penggunaan
Model ini dapat digunakan untuk:
- Klasifikasi awal teks penerimaan teknologi dalam bahasa Indonesia dan bahasa lain yang didukung mBERT, dengan tetap melakukan validasi pada domain target.
- Analisis ulasan pengguna dan komentar publik.
- Kategorisasi respons kuesioner terbuka.
- Labeling teks otomatis sebelum pemeriksaan manual.
- Identifikasi ulasan umum dan teks spam.
Data Pelatihan
Versi awal ini dilatih menggunakan dataset BERTAUT yang telah diseimbangkan dengan enam kategori:
- PE
- EE
- SI
- FC
- GR
- SP
Prosedur Pelatihan
Log Pelatihan per Epoch
| Epoch | Training Loss | Validation Loss | Accuracy | Macro Precision | Macro Recall | Macro F1 |
|---|---|---|---|---|---|---|
| 1 | 0.965550 | 0.371673 | 0.873940 | 0.880794 | 0.873952 | 0.873703 |
| 2 | 0.559251 | 0.317058 | 0.891209 | 0.895631 | 0.891218 | 0.891737 |
| 3 | 0.413123 | 0.429413 | 0.889953 | 0.894924 | 0.889946 | 0.890115 |
| 4 | 0.334815 | 0.425564 | 0.907064 | 0.908595 | 0.907070 | 0.907074 |
| 5 | 0.260256 | 0.609765 | 0.895290 | 0.900152 | 0.895304 | 0.894636 |
| 6 | 0.185834 | 0.636706 | 0.903925 | 0.905492 | 0.903926 | 0.904009 |
| 7 | 0.134153 | 0.654471 | 0.911460 | 0.912466 | 0.911461 | 0.911354 |
| 8 | 0.084147 | 0.662096 | 0.911931 | 0.913211 | 0.911935 | 0.912069 |
| 9 | 0.052045 | 0.714938 | 0.913187 | 0.914789 | 0.913191 | 0.913163 |
| 10 | 0.030871 | 0.719515 | 0.915071 | 0.916273 | 0.915074 | 0.915030 |
Ringkasan Pelatihan
| Item | Nilai |
|---|---|
| Total epoch pelatihan | 10 |
| Validation accuracy tertinggi | 0.915071 |
| Epoch dengan validation accuracy tertinggi | 10 |
| Validation Macro F1 tertinggi | 0.915030 |
| Epoch dengan validation Macro F1 tertinggi | 10 |
| Validation loss terendah | 0.317058 |
| Epoch dengan validation loss terendah | 2 |
| Validation accuracy pada epoch terakhir | 0.915071 |
Validation accuracy dan Macro F1 tertinggi diperoleh pada epoch ke-10. Validation loss terendah diperoleh pada epoch ke-2. Log pelatihan dan metrik evaluasi akhir merupakan tahapan berbeda sehingga tidak boleh diperlakukan sebagai pengukuran yang sama.
Evaluasi Akhir pada Holdout Test Set
Metrik berikut dihitung pada holdout test set terpisah yang terdiri dari 6,370 teks.
| Metrik | Nilai |
|---|---|
| Accuracy | 0.9132 |
| Macro Precision | 0.9145 |
| Macro Recall | 0.9132 |
| Macro F1-score | 0.9131 |
| Weighted F1-score | 0.9131 |
Hasil per Label pada Test Set
| Label | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| PE | 0.90 | 0.94 | 0.92 | 1,062 |
| EE | 0.91 | 0.92 | 0.92 | 1,061 |
| SI | 0.88 | 0.92 | 0.90 | 1,062 |
| FC | 0.89 | 0.94 | 0.92 | 1,062 |
| GR | 0.96 | 0.92 | 0.94 | 1,062 |
| SP | 0.94 | 0.84 | 0.89 | 1,061 |
Confusion Matrix
Baris menunjukkan label asli dan kolom menunjukkan label prediksi.
| True label \ Predicted label | PE | EE | SI | FC | GR | SP |
|---|---|---|---|---|---|---|
| PE | 995 | 25 | 31 | 3 | 5 | 3 |
| EE | 32 | 977 | 18 | 21 | 7 | 6 |
| SI | 32 | 7 | 978 | 39 | 0 | 6 |
| FC | 4 | 13 | 33 | 998 | 2 | 12 |
| GR | 25 | 22 | 3 | 6 | 975 | 31 |
| SP | 18 | 26 | 47 | 49 | 27 | 894 |
Cara Menggunakan Model
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
MODEL_ID = "teguholix/BERTAUT"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_ID,
revision="main"
)
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_ID,
revision="main"
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()
texts = [
"Aplikasi ini membantu saya menyelesaikan tugas lebih cepat.",
"Menu aplikasi ini sulit dipahami.",
"Saya memakai aplikasi ini karena direkomendasikan teman."
]
encoded = tokenizer(
texts,
return_tensors="pt",
padding=True,
truncation=True,
max_length=128
)
encoded = {
key: value.to(device)
for key, value in encoded.items()
}
with torch.inference_mode():
logits = model(**encoded).logits
probabilities = torch.softmax(logits, dim=-1)
predicted_ids = probabilities.argmax(dim=1).tolist()
predicted_scores = probabilities.max(dim=1).values.tolist()
for text, label_id, score in zip(
texts,
predicted_ids,
predicted_scores
):
label = model.config.id2label[label_id]
print("Teks :", text)
print("Label:", label)
print("Skor :", round(score, 4))
print("-" * 50)
Use Case Google Colab
Notebook Google Colab siap pakai tersedia untuk melakukan labeling otomatis menggunakan BERTAUT.
Notebook mendukung:
- Upload CSV dan Excel.
- Pemilihan kolom teks secara manual.
- Prediksi batch dengan progress bar.
- Pengambilan confidence score.
- Analisis distribusi label.
- Visualisasi confidence.
- Visualisasi wordcloud per label.
- Download ZIP dan ekspor ke Google Drive.
Buka use case BERTAUT di Google Colab
Keterbatasan
Model ini merupakan seed model. Dataset pelatihan masih terbatas dan dapat ditingkatkan dengan penambahan data yang diberi label secara independen pada versi berikutnya. Performa model dapat berbeda pada bahasa dan domain yang tidak serupa dengan data pelatihan.
Label BI, Behavioral Intention, belum dimasukkan pada rilis v1 karena dataset seed belum memiliki contoh BI dalam jumlah cukup.
Pengembangan berikutnya dapat mencakup:
- BI: Behavioral Intention.
- AU: Actual Use.
- Klasifikasi multi-label.
- Klasifikasi UTAUT berbasis sentimen.
- Dataset seimbang dengan jumlah data lebih besar.
- Data pelatihan multibahasa yang lebih luas.
Atribusi
BERTAUT v1 dibuat oleh Teguh Arie Sandy.
- Downloads last month
- 365
Model tree for teguholix/BERTAUT
Base model
google-bert/bert-base-multilingual-cased