File size: 6,817 Bytes
65e31bc 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 5e2358d 14bdc62 65e31bc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
license: mit
datasets:
- saiteja33/DAMASHA
language:
- en
base_model:
- FacebookAI/roberta-base
- answerdotai/ModernBERT-base
pipeline_tag: token-classification
---
# DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)
This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:
> **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution**
The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.
- **Base encoders:**
- [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base)
- [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base)
- **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper.
- **Task:** Token classification (binary authorship: human vs AI).
- **Language:** English
- **License (this model):** MIT
- **Training data license:** CC-BY-4.0 via the DAMASHA dataset.
If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).
---
## 1. Model Highlights
- **Fine-grained mixed-authorship detection**
Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents.
- **Adversarially robust**
Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).
- **Human-interpretable Info-Mask**
The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way.
- **Strong reported performance (from the paper)**
On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:
- **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**
- **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**
- **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**
> ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).
---
## 2. Intended Use
### What this model is for
- **Research on human–AI co-authorship**
- Studying where LLMs “take over” in mixed texts.
- Analysing robustness of detectors under adversarial perturbations.
- **Tooling / applications (with human oversight)**
- Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions.
- Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs.
### What this model is *not* for
- Automated “cheating detector” / plagiarism court.
- High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.
- Non-English or heavily code-mixed text (training data is English-centric).
Use this model as a **signal**, not a judge.
---
## 3. Data: DAMASHA-MAS
The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
- **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA)
### 3.1 What’s in MAS?
MAS consists of **mixed human–AI texts with explicit span tags**:
- Human text comes from several corpora for **domain diversity**, including:
- Reddit (M4-Reddit)
- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
- News summaries (XSUM)
- Wikipedia (M4-Wiki, MAGE-SQuAD)
- ArXiv abstracts (MAGE-SciGen)
- QA texts (MAGE-ELI5)
- AI text is generated by multiple modern LLMs:
- **DeepSeek-V3-671B** (open-source)
- **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source)
### 3.2 Span tagging
Authorship is marked using **explicit tags** around AI spans:
- `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
- The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
- Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation.
> During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files).
### 3.3 Adversarial attacks
MAS includes multiple **syntactic attacks** applied to the mixed text:
- Misspelling
- Unicode character substitution
- Invisible characters
- Punctuation substitution
- Upper/lower case swapping
- All-mixed combinations of the above
These perturbations make tokenization brittle and test robustness of detectors in realistic settings.
---
## 4. Model Architecture & Training
### 4.1 Architecture (conceptual)
The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper:
1. **Dual encoders**
- RoBERTa-base and ModernBERT-base encode the same input sequence.
2. **Feature fusion**
- Hidden states from both encoders are fused into a shared representation.
3. **Stylometric Info-Mask**
- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**.
- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
4. **Sequence model + CRF**
- A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}
### 4.2 Training setup (from the paper)
Key hyperparameters used for the Info-Mask models on MAS:
- **Number of labels:** 2
- **Max sequence length:** 512
- **Batch size:** 64
- **Epochs:** 5
- **Optimizer:** AdamW (with cosine annealing LR schedule)
- **Weight decay:** 0.01
- **Gradient clipping:** 1.0
- **Dropout:** Dynamic 0.1–0.3 (initial 0.1)
- **Warmup ratio:** 0.1
- **Early stopping patience:** 2
**Hardware & compute** (as reported):
- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
- ≈ 400 GPU hours for experiments.
> The exact training script used for this checkpoint is available in the project GitHub:
> <https://github.com/saitejalekkala33/DAMASHA>
---
---
license: mit
--- |