File size: 6,817 Bytes
65e31bc
 
 
 
 
 
 
 
 
 
 
14bdc62
 
 
 
 
5e2358d
14bdc62
 
 
 
5e2358d
 
 
14bdc62
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
5e2358d
14bdc62
 
5e2358d
14bdc62
 
5e2358d
14bdc62
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
 
 
 
 
 
 
 
 
 
 
5e2358d
14bdc62
 
5e2358d
14bdc62
 
 
 
 
65e31bc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: mit
datasets:
- saiteja33/DAMASHA
language:
- en
base_model:
- FacebookAI/roberta-base
- answerdotai/ModernBERT-base
pipeline_tag: token-classification
---

# DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)

This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:

> **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution**  

The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.

- **Base encoders:**  
  - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) 
  - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) 
- **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper.  
- **Task:** Token classification (binary authorship: human vs AI).  
- **Language:** English  
- **License (this model):** MIT  
- **Training data license:** CC-BY-4.0 via the DAMASHA dataset.  

If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).

---

## 1. Model Highlights

- **Fine-grained mixed-authorship detection**  
  Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents.  

- **Adversarially robust**  
  Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).  

- **Human-interpretable Info-Mask**  
  The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. 

- **Strong reported performance (from the paper)**  
  On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:  
  - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**  
  - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**  
  - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**  

> ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).

---

## 2. Intended Use

### What this model is for

- **Research on human–AI co-authorship**  
  - Studying where LLMs “take over” in mixed texts.  
  - Analysing robustness of detectors under adversarial perturbations.

- **Tooling / applications (with human oversight)**  
  - Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions.  
  - Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs.

### What this model is *not* for

- Automated “cheating detector” / plagiarism court.  
- High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.  
- Non-English or heavily code-mixed text (training data is English-centric). 

Use this model as a **signal**, not a judge.

---

## 3. Data: DAMASHA-MAS

The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:

- **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) 

### 3.1 What’s in MAS?

MAS consists of **mixed human–AI texts with explicit span tags**:  

- Human text comes from several corpora for **domain diversity**, including:  
  - Reddit (M4-Reddit)  
  - Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)  
  - News summaries (XSUM)  
  - Wikipedia (M4-Wiki, MAGE-SQuAD)  
  - ArXiv abstracts (MAGE-SciGen)  
  - QA texts (MAGE-ELI5)

- AI text is generated by multiple modern LLMs:  
  - **DeepSeek-V3-671B** (open-source)  
  - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source)  

### 3.2 Span tagging

Authorship is marked using **explicit tags** around AI spans:  

- `<AI_Start>``</AI_End>` denote AI-generated segments within otherwise human text.  
- The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.  
- Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation.

> During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files).

### 3.3 Adversarial attacks

MAS includes multiple **syntactic attacks** applied to the mixed text:  

- Misspelling  
- Unicode character substitution  
- Invisible characters  
- Punctuation substitution  
- Upper/lower case swapping  
- All-mixed combinations of the above  

These perturbations make tokenization brittle and test robustness of detectors in realistic settings.

---

## 4. Model Architecture & Training

### 4.1 Architecture (conceptual)

The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper:  

1. **Dual encoders**  
   - RoBERTa-base and ModernBERT-base encode the same input sequence.  
2. **Feature fusion**  
   - Hidden states from both encoders are fused into a shared representation.  
3. **Stylometric Info-Mask**  
   - Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**.  
   - This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}  
4. **Sequence model + CRF**  
   - A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}  

### 4.2 Training setup (from the paper)

Key hyperparameters used for the Info-Mask models on MAS:  

- **Number of labels:** 2  
- **Max sequence length:** 512  
- **Batch size:** 64  
- **Epochs:** 5  
- **Optimizer:** AdamW (with cosine annealing LR schedule)  
- **Weight decay:** 0.01  
- **Gradient clipping:** 1.0  
- **Dropout:** Dynamic 0.1–0.3 (initial 0.1)  
- **Warmup ratio:** 0.1  
- **Early stopping patience:** 2  

**Hardware & compute** (as reported): 

- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04  
- ≈ 400 GPU hours for experiments.

> The exact training script used for this checkpoint is available in the project GitHub:  
> <https://github.com/saitejalekkala33/DAMASHA>

---

---
license: mit
---