saiteja33 commited on
Commit
14bdc62
·
verified ·
1 Parent(s): 65e31bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -1
README.md CHANGED
@@ -9,7 +9,158 @@ base_model:
9
  - answerdotai/ModernBERT-base
10
  pipeline_tag: token-classification
11
  ---
12
- The model is trained in the DAMASHA-MAS dataset. The paper link: https://arxiv.org/abs/2512.04838. The dataset link: https://huggingface.co/datasets/saiteja33/DAMASHA. Finally the main github link for running the model in the local with this in download is: https://github.com/saitejalekkala33/DAMASHA.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ---
15
  license: mit
 
9
  - answerdotai/ModernBERT-base
10
  pipeline_tag: token-classification
11
  ---
12
+
13
+ # DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)
14
+
15
+ This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:
16
+
17
+ > **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution** :contentReference[oaicite:0]{index=0}
18
+
19
+ The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.
20
+
21
+ - **Base encoders:**
22
+ - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) :contentReference[oaicite:1]{index=1}
23
+ - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) :contentReference[oaicite:2]{index=2}
24
+ - **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper. :contentReference[oaicite:3]{index=3}
25
+ - **Task:** Token classification (binary authorship: human vs AI).
26
+ - **Language:** English
27
+ - **License (this model):** MIT
28
+ - **Training data license:** CC-BY-4.0 via the DAMASHA dataset. :contentReference[oaicite:4]{index=4}
29
+
30
+ If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).
31
+
32
+ ---
33
+
34
+ ## 1. Model Highlights
35
+
36
+ - **Fine-grained mixed-authorship detection**
37
+ Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents. :contentReference[oaicite:5]{index=5}
38
+
39
+ - **Adversarially robust**
40
+ Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks). :contentReference[oaicite:6]{index=6}
41
+
42
+ - **Human-interpretable Info-Mask**
43
+ The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. :contentReference[oaicite:7]{index=7}
44
+
45
+ - **Strong reported performance (from the paper)**
46
+ On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves: :contentReference[oaicite:8]{index=8}
47
+ - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**
48
+ - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**
49
+ - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**
50
+
51
+ > ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).
52
+
53
+ ---
54
+
55
+ ## 2. Intended Use
56
+
57
+ ### What this model is for
58
+
59
+ - **Research on human–AI co-authorship**
60
+ - Studying where LLMs “take over” in mixed texts.
61
+ - Analysing robustness of detectors under adversarial perturbations.
62
+
63
+ - **Tooling / applications (with human oversight)**
64
+ - Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions.
65
+ - Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs.
66
+
67
+ ### What this model is *not* for
68
+
69
+ - Automated “cheating detector” / plagiarism court.
70
+ - High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.
71
+ - Non-English or heavily code-mixed text (training data is English-centric). :contentReference[oaicite:9]{index=9}
72
+
73
+ Use this model as a **signal**, not a judge.
74
+
75
+ ---
76
+
77
+ ## 3. Data: DAMASHA-MAS
78
+
79
+ The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
80
+
81
+ - **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) :contentReference[oaicite:10]{index=10}
82
+
83
+ ### 3.1 What’s in MAS?
84
+
85
+ MAS consists of **mixed human–AI texts with explicit span tags**: :contentReference[oaicite:11]{index=11}
86
+
87
+ - Human text comes from several corpora for **domain diversity**, including:
88
+ - Reddit (M4-Reddit)
89
+ - Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
90
+ - News summaries (XSUM)
91
+ - Wikipedia (M4-Wiki, MAGE-SQuAD)
92
+ - ArXiv abstracts (MAGE-SciGen)
93
+ - QA texts (MAGE-ELI5)
94
+
95
+ - AI text is generated by multiple modern LLMs:
96
+ - **DeepSeek-V3-671B** (open-source)
97
+ - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source) :contentReference[oaicite:12]{index=12}
98
+
99
+ ### 3.2 Span tagging
100
+
101
+ Authorship is marked using **explicit tags** around AI spans: :contentReference[oaicite:13]{index=13}
102
+
103
+ - `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
104
+ - The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
105
+ - Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation.
106
+
107
+ > During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files).
108
+
109
+ ### 3.3 Adversarial attacks
110
+
111
+ MAS includes multiple **syntactic attacks** applied to the mixed text: :contentReference[oaicite:14]{index=14}
112
+
113
+ - Misspelling
114
+ - Unicode character substitution
115
+ - Invisible characters
116
+ - Punctuation substitution
117
+ - Upper/lower case swapping
118
+ - All-mixed combinations of the above
119
+
120
+ These perturbations make tokenization brittle and test robustness of detectors in realistic settings.
121
+
122
+ ---
123
+
124
+ ## 4. Model Architecture & Training
125
+
126
+ ### 4.1 Architecture (conceptual)
127
+
128
+ The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper: :contentReference[oaicite:15]{index=15}
129
+
130
+ 1. **Dual encoders**
131
+ - RoBERTa-base and ModernBERT-base encode the same input sequence.
132
+ 2. **Feature fusion**
133
+ - Hidden states from both encoders are fused into a shared representation.
134
+ 3. **Stylometric Info-Mask**
135
+ - Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**.
136
+ - This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
137
+ 4. **Sequence model + CRF**
138
+ - A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}
139
+
140
+ ### 4.2 Training setup (from the paper)
141
+
142
+ Key hyperparameters used for the Info-Mask models on MAS: :contentReference[oaicite:18]{index=18}
143
+
144
+ - **Number of labels:** 2
145
+ - **Max sequence length:** 512
146
+ - **Batch size:** 64
147
+ - **Epochs:** 5
148
+ - **Optimizer:** AdamW (with cosine annealing LR schedule)
149
+ - **Weight decay:** 0.01
150
+ - **Gradient clipping:** 1.0
151
+ - **Dropout:** Dynamic 0.1–0.3 (initial 0.1)
152
+ - **Warmup ratio:** 0.1
153
+ - **Early stopping patience:** 2
154
+
155
+ **Hardware & compute** (as reported): :contentReference[oaicite:19]{index=19}
156
+
157
+ - AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
158
+ - ≈ 400 GPU hours (~USD $720) for experiments.
159
+
160
+ > The exact training script used for this checkpoint is available in the project GitHub:
161
+ > <https://github.com/saitejalekkala33/DAMASHA>
162
+
163
+ ---
164
 
165
  ---
166
  license: mit