Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,158 @@ base_model:
|
|
| 9 |
- answerdotai/ModernBERT-base
|
| 10 |
pipeline_tag: token-classification
|
| 11 |
---
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
---
|
| 15 |
license: mit
|
|
|
|
| 9 |
- answerdotai/ModernBERT-base
|
| 10 |
pipeline_tag: token-classification
|
| 11 |
---
|
| 12 |
+
|
| 13 |
+
# DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)
|
| 14 |
+
|
| 15 |
+
This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:
|
| 16 |
+
|
| 17 |
+
> **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution** :contentReference[oaicite:0]{index=0}
|
| 18 |
+
|
| 19 |
+
The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.
|
| 20 |
+
|
| 21 |
+
- **Base encoders:**
|
| 22 |
+
- [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) :contentReference[oaicite:1]{index=1}
|
| 23 |
+
- [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) :contentReference[oaicite:2]{index=2}
|
| 24 |
+
- **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper. :contentReference[oaicite:3]{index=3}
|
| 25 |
+
- **Task:** Token classification (binary authorship: human vs AI).
|
| 26 |
+
- **Language:** English
|
| 27 |
+
- **License (this model):** MIT
|
| 28 |
+
- **Training data license:** CC-BY-4.0 via the DAMASHA dataset. :contentReference[oaicite:4]{index=4}
|
| 29 |
+
|
| 30 |
+
If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## 1. Model Highlights
|
| 35 |
+
|
| 36 |
+
- **Fine-grained mixed-authorship detection**
|
| 37 |
+
Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents. :contentReference[oaicite:5]{index=5}
|
| 38 |
+
|
| 39 |
+
- **Adversarially robust**
|
| 40 |
+
Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks). :contentReference[oaicite:6]{index=6}
|
| 41 |
+
|
| 42 |
+
- **Human-interpretable Info-Mask**
|
| 43 |
+
The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. :contentReference[oaicite:7]{index=7}
|
| 44 |
+
|
| 45 |
+
- **Strong reported performance (from the paper)**
|
| 46 |
+
On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves: :contentReference[oaicite:8]{index=8}
|
| 47 |
+
- **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**
|
| 48 |
+
- **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**
|
| 49 |
+
- **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**
|
| 50 |
+
|
| 51 |
+
> ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## 2. Intended Use
|
| 56 |
+
|
| 57 |
+
### What this model is for
|
| 58 |
+
|
| 59 |
+
- **Research on human–AI co-authorship**
|
| 60 |
+
- Studying where LLMs “take over” in mixed texts.
|
| 61 |
+
- Analysing robustness of detectors under adversarial perturbations.
|
| 62 |
+
|
| 63 |
+
- **Tooling / applications (with human oversight)**
|
| 64 |
+
- Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions.
|
| 65 |
+
- Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs.
|
| 66 |
+
|
| 67 |
+
### What this model is *not* for
|
| 68 |
+
|
| 69 |
+
- Automated “cheating detector” / plagiarism court.
|
| 70 |
+
- High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.
|
| 71 |
+
- Non-English or heavily code-mixed text (training data is English-centric). :contentReference[oaicite:9]{index=9}
|
| 72 |
+
|
| 73 |
+
Use this model as a **signal**, not a judge.
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## 3. Data: DAMASHA-MAS
|
| 78 |
+
|
| 79 |
+
The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
|
| 80 |
+
|
| 81 |
+
- **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) :contentReference[oaicite:10]{index=10}
|
| 82 |
+
|
| 83 |
+
### 3.1 What’s in MAS?
|
| 84 |
+
|
| 85 |
+
MAS consists of **mixed human–AI texts with explicit span tags**: :contentReference[oaicite:11]{index=11}
|
| 86 |
+
|
| 87 |
+
- Human text comes from several corpora for **domain diversity**, including:
|
| 88 |
+
- Reddit (M4-Reddit)
|
| 89 |
+
- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
|
| 90 |
+
- News summaries (XSUM)
|
| 91 |
+
- Wikipedia (M4-Wiki, MAGE-SQuAD)
|
| 92 |
+
- ArXiv abstracts (MAGE-SciGen)
|
| 93 |
+
- QA texts (MAGE-ELI5)
|
| 94 |
+
|
| 95 |
+
- AI text is generated by multiple modern LLMs:
|
| 96 |
+
- **DeepSeek-V3-671B** (open-source)
|
| 97 |
+
- **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source) :contentReference[oaicite:12]{index=12}
|
| 98 |
+
|
| 99 |
+
### 3.2 Span tagging
|
| 100 |
+
|
| 101 |
+
Authorship is marked using **explicit tags** around AI spans: :contentReference[oaicite:13]{index=13}
|
| 102 |
+
|
| 103 |
+
- `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
|
| 104 |
+
- The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
|
| 105 |
+
- Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation.
|
| 106 |
+
|
| 107 |
+
> During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files).
|
| 108 |
+
|
| 109 |
+
### 3.3 Adversarial attacks
|
| 110 |
+
|
| 111 |
+
MAS includes multiple **syntactic attacks** applied to the mixed text: :contentReference[oaicite:14]{index=14}
|
| 112 |
+
|
| 113 |
+
- Misspelling
|
| 114 |
+
- Unicode character substitution
|
| 115 |
+
- Invisible characters
|
| 116 |
+
- Punctuation substitution
|
| 117 |
+
- Upper/lower case swapping
|
| 118 |
+
- All-mixed combinations of the above
|
| 119 |
+
|
| 120 |
+
These perturbations make tokenization brittle and test robustness of detectors in realistic settings.
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## 4. Model Architecture & Training
|
| 125 |
+
|
| 126 |
+
### 4.1 Architecture (conceptual)
|
| 127 |
+
|
| 128 |
+
The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper: :contentReference[oaicite:15]{index=15}
|
| 129 |
+
|
| 130 |
+
1. **Dual encoders**
|
| 131 |
+
- RoBERTa-base and ModernBERT-base encode the same input sequence.
|
| 132 |
+
2. **Feature fusion**
|
| 133 |
+
- Hidden states from both encoders are fused into a shared representation.
|
| 134 |
+
3. **Stylometric Info-Mask**
|
| 135 |
+
- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**.
|
| 136 |
+
- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
|
| 137 |
+
4. **Sequence model + CRF**
|
| 138 |
+
- A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}
|
| 139 |
+
|
| 140 |
+
### 4.2 Training setup (from the paper)
|
| 141 |
+
|
| 142 |
+
Key hyperparameters used for the Info-Mask models on MAS: :contentReference[oaicite:18]{index=18}
|
| 143 |
+
|
| 144 |
+
- **Number of labels:** 2
|
| 145 |
+
- **Max sequence length:** 512
|
| 146 |
+
- **Batch size:** 64
|
| 147 |
+
- **Epochs:** 5
|
| 148 |
+
- **Optimizer:** AdamW (with cosine annealing LR schedule)
|
| 149 |
+
- **Weight decay:** 0.01
|
| 150 |
+
- **Gradient clipping:** 1.0
|
| 151 |
+
- **Dropout:** Dynamic 0.1–0.3 (initial 0.1)
|
| 152 |
+
- **Warmup ratio:** 0.1
|
| 153 |
+
- **Early stopping patience:** 2
|
| 154 |
+
|
| 155 |
+
**Hardware & compute** (as reported): :contentReference[oaicite:19]{index=19}
|
| 156 |
+
|
| 157 |
+
- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
|
| 158 |
+
- ≈ 400 GPU hours (~USD $720) for experiments.
|
| 159 |
+
|
| 160 |
+
> The exact training script used for this checkpoint is available in the project GitHub:
|
| 161 |
+
> <https://github.com/saitejalekkala33/DAMASHA>
|
| 162 |
+
|
| 163 |
+
---
|
| 164 |
|
| 165 |
---
|
| 166 |
license: mit
|