Text Classification
Vietnamese
AnhNguyen2299 commited on
Commit
be0bc47
·
verified ·
1 Parent(s): fce3b0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -5,5 +5,41 @@ language:
5
  metrics:
6
  - accuracy
7
  pipeline_tag: text-classification
 
 
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  metrics:
6
  - accuracy
7
  pipeline_tag: text-classification
8
+ datasets:
9
+ - ICCIES-2025-DetectAI/vietnamese_news_human_ai
10
  ---
11
 
12
+ # Multilingual-E5 with RoBERTa-base for AI-Generated Vietnamese News Detection
13
+
14
+ ## Overview
15
+
16
+ This repository hosts the implementation of a **hybrid model** that combines **Multilingual-E5 embeddings** with a **RoBERTa-base classification head** to distinguish between **human-authored** and **AI-generated** Vietnamese news articles.
17
+
18
+ Developed as part of the research published in *Computational Intelligence in Engineering Science (Springer CCIS, vol. 2587)*, the model achieves a classification accuracy of **over 99%**, offering a reliable tool for combating misinformation and enhancing journalistic integrity in the Vietnamese context.
19
+
20
+ By leveraging the **semantic richness** of Multilingual-E5 and the **optimized pre-training** of RoBERTa-base, the model effectively captures subtle linguistic and stylistic differences. Training was performed on a **balanced dataset of 200,000 articles**:
21
+
22
+ - 100,000 human-written texts sourced from reputable outlets (*Thanh Niên*, *VnExpress*)
23
+ - 100,000 AI-generated texts produced by advanced large language models (LLMs) such as **GPT-4o Mini, Gemini Flash 1.5, Llama 3.3, and DeepSeek**
24
+
25
+ ---
26
+
27
+ ## Citation
28
+
29
+ If you use this model or dataset, please cite the following paper:
30
+
31
+ ```bibtex
32
+ @InProceedings{10.1007/978-3-031-98170-8_11,
33
+ author = {Huynh, Minh-Phuc and Nguyen, Hoang-Anh and Le, Anh-Cuong and Truong, Dinh-Tu},
34
+ title = {Detecting AI-Generated Vietnamese News Articles with Multilingual-E5 and BERT},
35
+ booktitle = {Computational Intelligence in Engineering Science},
36
+ year = {2026},
37
+ publisher = {Springer Nature Switzerland},
38
+ address = {Cham},
39
+ pages = {130--144},
40
+ isbn = {978-3-031-98170-8}
41
+ }
42
+
43
+ ## Contact
44
+
45
+ For questions or clarifications regarding the dataset or evaluation procedure, please contact Lê Anh Cường at leanhcuong@tdtu.edu.vn