putraharifin
/

tubes_deep_learning

Text Classification

plagiarism-detection

Model card Files Files and versions

putraharifin commited on 27 days ago

Commit

bf85809

·

verified ·

1 Parent(s): 91013b8

Add Model Card

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+language: id
+license: apache-2.0
+tags:
+  - indobert
+  - text-classification
+  - plagiarism-detection
+  - indonesian
+  - fine-tuned
+pipeline_tag: text-classification
+---
+# IndoBERT Plagiarisme Detector
+Model **IndoBERT-base-p1** yang di-fine-tune untuk **deteksi kemiripan teks bahasa Indonesia** (3 kelas):
+- `LABEL_0` → 🟢 Tidak Mirip (Non-Duplicate)
+- `LABEL_1` → 🟡 Paraphrase (Mirip secara makna)
+- `LABEL_2` → 🔴 Plagiarisme (Sangat mirip / copy-paste literal)
+### Dataset
+- Total: 3000 data balanced (1000 per kelas)
+- Sumber: Quora Duplicate Questions Indonesia + augmentasi sintetik untuk kelas plagiarisme (Jaccard ≥ 0.95)
+### Performa (Test Set)
+- **Accuracy**: 78.33%
+- **F1-Weighted**: 78.33%
+- Metode: Full Fine-Tuning (3 epochs)
+### Cara Pakai
+```python
+from transformers import pipeline
+detector = pipeline(
+    "text-classification",
+    model="putraharifin/tubes_deep_learning",
+    return_all_scores=True
+)
+result = detector("Apa pengganti y?", "Apa pengganti y dong")
+print(result)
+# Contoh output: Plagiarisme dengan confidence tinggi