putraharifin commited on
Commit
bf85809
·
verified ·
1 Parent(s): 91013b8

Add Model Card

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: id
3
+ license: apache-2.0
4
+ tags:
5
+ - indobert
6
+ - text-classification
7
+ - plagiarism-detection
8
+ - indonesian
9
+ - fine-tuned
10
+ pipeline_tag: text-classification
11
+ ---
12
+
13
+ # IndoBERT Plagiarisme Detector
14
+
15
+ Model **IndoBERT-base-p1** yang di-fine-tune untuk **deteksi kemiripan teks bahasa Indonesia** (3 kelas):
16
+
17
+ - `LABEL_0` → 🟢 Tidak Mirip (Non-Duplicate)
18
+ - `LABEL_1` → 🟡 Paraphrase (Mirip secara makna)
19
+ - `LABEL_2` → 🔴 Plagiarisme (Sangat mirip / copy-paste literal)
20
+
21
+ ### Dataset
22
+ - Total: 3000 data balanced (1000 per kelas)
23
+ - Sumber: Quora Duplicate Questions Indonesia + augmentasi sintetik untuk kelas plagiarisme (Jaccard ≥ 0.95)
24
+
25
+ ### Performa (Test Set)
26
+ - **Accuracy**: 78.33%
27
+ - **F1-Weighted**: 78.33%
28
+ - Metode: Full Fine-Tuning (3 epochs)
29
+
30
+ ### Cara Pakai
31
+
32
+ ```python
33
+ from transformers import pipeline
34
+
35
+ detector = pipeline(
36
+ "text-classification",
37
+ model="putraharifin/tubes_deep_learning",
38
+ return_all_scores=True
39
+ )
40
+
41
+ result = detector("Apa pengganti y?", "Apa pengganti y dong")
42
+ print(result)
43
+ # Contoh output: Plagiarisme dengan confidence tinggi