pawnplus / README.md
crayden's picture
Update model card
a0aca63 verified
|
Raw
History Blame Contribute Delete
7.9 kB
---
license: mit
language:
- en
tags:
- text-classification
- ai-generated-text-detection
- machine-generated-text
- llm-detection
- pawn
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
- roc_auc
base_model:
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-1B
datasets:
- yaful/MAGE
model-index:
- name: PAWN++
results:
- task:
type: text-classification
name: Machine-Generated Text Detection
dataset:
name: MAGE
type: yaful/MAGE
metrics:
- type: accuracy
value: 0.9515
- type: f1_macro
value: 0.9515
- type: roc_auc
value: 0.9836
- type: f1
value: 0.9506
name: AI F1
- type: f1
value: 0.9523
name: Human F1
---
# PAWN++
[![GitHub](https://img.shields.io/badge/GitHub-automatic--goggles-181717?logo=github)](https://github.com/HSE-Team-142/automatic-goggles/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/HSE-Team-142/automatic-goggles/blob/main/LICENSE)
[![Task](https://img.shields.io/badge/task-AI%20text%20detection-blue)](#)
[![Accuracy](https://img.shields.io/badge/accuracy-95.15%25-success)](#evaluation)
[![ROC AUC](https://img.shields.io/badge/ROC%20AUC-0.984-success)](#evaluation)
[![Macro F1](https://img.shields.io/badge/macro%20F1-0.951-success)](#evaluation)
πŸ“¦ **Repository:** [HSE-Team-142/automatic-goggles](https://github.com/HSE-Team-142/automatic-goggles/)
**PAWN++** is a detector for identifying machine-generated (AI) text. It extends the
[PAWN](https://www.sciencedirect.com/science/article/pii/S156625352500538X) architecture, which
predicts authorship from the per-token hidden states and probability metrics of a *frozen* language
model. PAWN++ adds an optional second frozen language model, cross-model metrics (including a
Binoculars-style cross-perplexity score), second-model token metrics, hidden-state fusion, and
aggregated sequence-level features that modulate the representation through FiLM.
This card describes the best-performing configuration
(`mage_llama_instruct_llama_base_metrics_xppl_hs_uniform_agg_metrics_full`, checkpoint `checkpoint-39884`).
## Model Description
PAWN++ does **not** fine-tune the backbone LLMs. The two language models are frozen and used only as
feature extractors; the only trained parameters are three lightweight MLP heads and a FiLM
conditioning layer.
- **Model type:** Frozen-LLM feature extractor + gated MLP classifier (binary)
- **Task:** Binary classification β€” `human` (0) vs. `ai` (1)
- **Primary model (frozen):** `meta-llama/Llama-3.2-1B-Instruct`
- **Second model (frozen):** `meta-llama/Llama-3.2-1B`
- **Language:** English
- **Max sequence length:** 512 tokens
- **License:** MIT
### Architecture
For each input the feature extractor runs both frozen LLMs and produces:
1. **Per-token metrics** for each model β€” `entropy`, `max_log_probs`, `next_token_log_probs`,
`rank`, `top_p` β€” plus the **cross-perplexity (xppl)** between the two models.
2. **Hidden states** from both models, fused across layers with `uniform` fusion.
3. **Aggregated sequence-level features** β€” per model: `energy`, `mean`, `std`, `var`, `skew`,
`kurtosis`, `mean_diff`, `std_diff`, `var_2nd`, `entropy_2nd`, `autocorr_2nd`; and **cross-model**:
`cov`, `corr`, `cos_sim`, `binoculars_score`.
Three MLP heads process these signals:
- **`metrics_nn`** maps the per-token metric vector to a 256-dim feature space.
- **`gate_nn`** takes the concatenated current/next hidden states of both models plus a positional
scalar and produces 256 gate logits per token; a softmax over the sequence axis yields an
attention-style weighting that aggregates the token metric features into a single vector.
- The aggregated vector is modulated by a **FiLM** layer (`gamma`, `beta`) conditioned on the
normalized sequence-level aggregate features.
- **`aggregate_nn`** maps the result to a single logit.
The output is a single logit; `sigmoid(logit)` is the probability of the `human` class and the
prediction is `ai` when `logit >= 0`.
| Hyperparameter | Value |
|---|---|
| `metric_features` | 256 |
| `gates` | 256 |
| `mlp_hidden_features` | 256 |
| `mlp_hidden_layers` | 3 |
| `mlp_dropout` | 0.0 |
| `token_dropout` | 0.15 |
| `residual` | true |
| `hidden_state_fusion` | uniform |
## Intended Use
- **Primary use:** Research on machine-generated-text detection and AI-text classification of English
passages.
- **Out of scope:** High-stakes decisions (academic misconduct, hiring, moderation) without human
review; non-English text; short texts; and detecting generators or domains far from the training
distribution. As with all detectors, predictions should be treated as a signal, not proof.
## Training Data
Trained and evaluated on the **MAGE** benchmark for machine-generated text detection, which spans
multiple domains and many generator models, framed as a binary human-vs-AI task.
## Training Procedure
- Backbones frozen; only the MLP heads and FiLM layer are trained.
- **Objective:** Binary cross-entropy with `label_smoothing = 0.2` and `pos_weight = 0.413`.
- **Optimizer:** AdamW, `learning_rate = 1e-3`, `weight_decay = 1e-2`, `max_grad_norm = 1.0`.
- **Schedule:** up to 5 epochs (`max_steps = 49855`), batch size 32, early stopping (patience 5),
seed 42.
- **Model selection:** best checkpoint by validation AUROC (`checkpoint-39884`, epoch 4, validation
AUROC β‰ˆ **0.9933**).
## Evaluation
Results on the MAGE test set:
| Metric | Value |
|---|---|
| Accuracy | 0.9515 |
| Macro F1 | 0.9515 |
| ROC AUC | 0.9836 |
| AI β€” Precision | 0.9710 |
| AI β€” Recall | 0.9311 |
| AI β€” F1 | 0.9506 |
| Human β€” Precision | 0.9334 |
| Human β€” Recall | 0.9720 |
| Human β€” F1 | 0.9523 |
| Test loss | 0.2456 |
**Runtime (test split):** 1619.5 s, 37.5 samples/s, 1.173 steps/s.
> The model is slightly more precise on AI text (fewer false AI flags) and has higher recall on human
> text, i.e. it is conservative about labeling text as AI-generated.
## How to Use
Inference is provided through `inference.py`, which loads the frozen backbones plus the trained heads
from a checkpoint and a training YAML config:
```bash
uv run PAWN++/inference.py \
--config PAWN++/experiments/MAGE/configs/pawn/two_models/mage_llama_instruct_llama_base_metrics_xppl_hs_uniform_agg_metrics_full.yaml \
--checkpoint PAWN++/checkpoint-39884/pytorch_model.bin \
--text "Your text to classify here."
```
```python
from inference import load_model, predict
model, device = load_model(
config_path="PAWN++/experiments/MAGE/configs/pawn/two_models/"
"mage_llama_instruct_llama_base_metrics_xppl_hs_uniform_agg_metrics_full.yaml",
checkpoint_path="PAWN++/checkpoint-39884/pytorch_model.bin",
)
results = predict(model, ["Your text to classify here."], device)
# each result: {"label": "human"|"ai", "prediction": 0|1, "prob_human": float, "logit": float}
```
> **Note:** The Llama-3.2 backbones are gated on the Hugging Face Hub. Set `HF_TOKEN` in a `.env` file
> to download them. A GPU is recommended; the code falls back to MPS or CPU automatically.
## Limitations and Bias
- English-only; performance on other languages is not evaluated and expected to degrade.
- Detection quality depends on the generators and domains seen during training (MAGE); novel models,
prompting styles, paraphrasing or adversarial edits can reduce accuracy.
- Depends on two frozen Llama-3.2-1B backbones, which carry their own data biases.
- Reported metrics reflect the MAGE test distribution and may not transfer out of distribution; see
the OOD evaluation utilities in the repository.
## Citation
PAWN++ builds on the PAWN detector:
> PAWN: Perplexity-Aware Watermark-free News (machine-generated text detection).
> https://www.sciencedirect.com/science/article/pii/S156625352500538X