saiteja33 commited on
Commit
5e2358d
·
verified ·
1 Parent(s): 14bdc62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -14,18 +14,18 @@ pipeline_tag: token-classification
14
 
15
  This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:
16
 
17
- > **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution** :contentReference[oaicite:0]{index=0}
18
 
19
  The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.
20
 
21
  - **Base encoders:**
22
- - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) :contentReference[oaicite:1]{index=1}
23
- - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) :contentReference[oaicite:2]{index=2}
24
- - **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper. :contentReference[oaicite:3]{index=3}
25
  - **Task:** Token classification (binary authorship: human vs AI).
26
  - **Language:** English
27
  - **License (this model):** MIT
28
- - **Training data license:** CC-BY-4.0 via the DAMASHA dataset. :contentReference[oaicite:4]{index=4}
29
 
30
  If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).
31
 
@@ -34,16 +34,16 @@ If you use this model, **please also cite the DAMASHA paper and dataset** (see C
34
  ## 1. Model Highlights
35
 
36
  - **Fine-grained mixed-authorship detection**
37
- Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents. :contentReference[oaicite:5]{index=5}
38
 
39
  - **Adversarially robust**
40
- Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks). :contentReference[oaicite:6]{index=6}
41
 
42
  - **Human-interpretable Info-Mask**
43
- The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. :contentReference[oaicite:7]{index=7}
44
 
45
  - **Strong reported performance (from the paper)**
46
- On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves: :contentReference[oaicite:8]{index=8}
47
  - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**
48
  - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**
49
  - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**
@@ -68,7 +68,7 @@ If you use this model, **please also cite the DAMASHA paper and dataset** (see C
68
 
69
  - Automated “cheating detector” / plagiarism court.
70
  - High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.
71
- - Non-English or heavily code-mixed text (training data is English-centric). :contentReference[oaicite:9]{index=9}
72
 
73
  Use this model as a **signal**, not a judge.
74
 
@@ -78,11 +78,11 @@ Use this model as a **signal**, not a judge.
78
 
79
  The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
80
 
81
- - **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) :contentReference[oaicite:10]{index=10}
82
 
83
  ### 3.1 What’s in MAS?
84
 
85
- MAS consists of **mixed human–AI texts with explicit span tags**: :contentReference[oaicite:11]{index=11}
86
 
87
  - Human text comes from several corpora for **domain diversity**, including:
88
  - Reddit (M4-Reddit)
@@ -94,11 +94,11 @@ MAS consists of **mixed human–AI texts with explicit span tags**: :contentRefe
94
 
95
  - AI text is generated by multiple modern LLMs:
96
  - **DeepSeek-V3-671B** (open-source)
97
- - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source) :contentReference[oaicite:12]{index=12}
98
 
99
  ### 3.2 Span tagging
100
 
101
- Authorship is marked using **explicit tags** around AI spans: :contentReference[oaicite:13]{index=13}
102
 
103
  - `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
104
  - The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
@@ -108,7 +108,7 @@ Authorship is marked using **explicit tags** around AI spans: :contentReference[
108
 
109
  ### 3.3 Adversarial attacks
110
 
111
- MAS includes multiple **syntactic attacks** applied to the mixed text: :contentReference[oaicite:14]{index=14}
112
 
113
  - Misspelling
114
  - Unicode character substitution
@@ -125,7 +125,7 @@ These perturbations make tokenization brittle and test robustness of detectors i
125
 
126
  ### 4.1 Architecture (conceptual)
127
 
128
- The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper: :contentReference[oaicite:15]{index=15}
129
 
130
  1. **Dual encoders**
131
  - RoBERTa-base and ModernBERT-base encode the same input sequence.
@@ -139,7 +139,7 @@ The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA
139
 
140
  ### 4.2 Training setup (from the paper)
141
 
142
- Key hyperparameters used for the Info-Mask models on MAS: :contentReference[oaicite:18]{index=18}
143
 
144
  - **Number of labels:** 2
145
  - **Max sequence length:** 512
@@ -152,10 +152,10 @@ Key hyperparameters used for the Info-Mask models on MAS: :contentReference[oaic
152
  - **Warmup ratio:** 0.1
153
  - **Early stopping patience:** 2
154
 
155
- **Hardware & compute** (as reported): :contentReference[oaicite:19]{index=19}
156
 
157
  - AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
158
- - ≈ 400 GPU hours (~USD $720) for experiments.
159
 
160
  > The exact training script used for this checkpoint is available in the project GitHub:
161
  > <https://github.com/saitejalekkala33/DAMASHA>
 
14
 
15
  This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:
16
 
17
+ > **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution**
18
 
19
  The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.
20
 
21
  - **Base encoders:**
22
+ - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base)
23
+ - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base)
24
+ - **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper.
25
  - **Task:** Token classification (binary authorship: human vs AI).
26
  - **Language:** English
27
  - **License (this model):** MIT
28
+ - **Training data license:** CC-BY-4.0 via the DAMASHA dataset.
29
 
30
  If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).
31
 
 
34
  ## 1. Model Highlights
35
 
36
  - **Fine-grained mixed-authorship detection**
37
+ Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents.
38
 
39
  - **Adversarially robust**
40
+ Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).
41
 
42
  - **Human-interpretable Info-Mask**
43
+ The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way.
44
 
45
  - **Strong reported performance (from the paper)**
46
+ On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:
47
  - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**
48
  - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**
49
  - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**
 
68
 
69
  - Automated “cheating detector” / plagiarism court.
70
  - High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.
71
+ - Non-English or heavily code-mixed text (training data is English-centric).
72
 
73
  Use this model as a **signal**, not a judge.
74
 
 
78
 
79
  The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
80
 
81
+ - **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA)
82
 
83
  ### 3.1 What’s in MAS?
84
 
85
+ MAS consists of **mixed human–AI texts with explicit span tags**:
86
 
87
  - Human text comes from several corpora for **domain diversity**, including:
88
  - Reddit (M4-Reddit)
 
94
 
95
  - AI text is generated by multiple modern LLMs:
96
  - **DeepSeek-V3-671B** (open-source)
97
+ - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source)
98
 
99
  ### 3.2 Span tagging
100
 
101
+ Authorship is marked using **explicit tags** around AI spans:
102
 
103
  - `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
104
  - The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
 
108
 
109
  ### 3.3 Adversarial attacks
110
 
111
+ MAS includes multiple **syntactic attacks** applied to the mixed text:
112
 
113
  - Misspelling
114
  - Unicode character substitution
 
125
 
126
  ### 4.1 Architecture (conceptual)
127
 
128
+ The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper:
129
 
130
  1. **Dual encoders**
131
  - RoBERTa-base and ModernBERT-base encode the same input sequence.
 
139
 
140
  ### 4.2 Training setup (from the paper)
141
 
142
+ Key hyperparameters used for the Info-Mask models on MAS:
143
 
144
  - **Number of labels:** 2
145
  - **Max sequence length:** 512
 
152
  - **Warmup ratio:** 0.1
153
  - **Early stopping patience:** 2
154
 
155
+ **Hardware & compute** (as reported):
156
 
157
  - AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
158
+ - ≈ 400 GPU hours for experiments.
159
 
160
  > The exact training script used for this checkpoint is available in the project GitHub:
161
  > <https://github.com/saitejalekkala33/DAMASHA>