voight-kampff-pan2024-classifier

A calibrated Linear SVM classifier for AI-generated text detection, built for the Voight-Kampff Generative AI Shared Task (PAN @ CLEF 2024).

This classifier operates on embeddings produced by Alejandro-Pardo/voight-kampff-pan2024-gte-en-v1.5. Given an embedding of a text chunk (~64 tokens), it classifies it as human-written (0) or AI-generated (1) with calibrated probability scores.

Model Details

Parameter	Value
Algorithm	`LinearSVC` (scikit-learn)
Calibration	`CalibratedClassifierCV` with `cv='prefit'`
Kernel	Linear
Input	Embeddings from voight-kampff-pan2024-gte-en-v1.5
Output	Calibrated probabilities for human (0) and AI (1)
Format	Python pickle (`.pkl`)

Training Data

The SVM was trained on embeddings generated from:

PAN 2024 competition dataset + Ollama-augmented texts (llama 3.2 1b, qwen 2.5 1b, gemma 2 2b)
Combined train + validation sets for training, test set for calibration
18 different LLM sources (GPT-3.5, GPT-4, LLaMA, Mistral, Alpaca, Qwen, Gemma, etc.)
Text chunks of ~64 tokens with 15% leetspeak noise injection

Note: The pre-computed embeddings used for training are not included. They can be regenerated using the embedding model and the training data available in the GitHub repository.

Usage

Quick Start

from sentence_transformers import SentenceTransformer
import pickle

# 1. Load the embedding model
embeddings_model = SentenceTransformer(
    'alejandroparbas/voight-kampff-pan2024-gte-en-v1.5',
    trust_remote_code=True
)

# 2. Load the SVM classifier
with open('nlp_classifier_20.pkl', 'rb') as f:
    classifier = pickle.load(f)

# 3. Classify a text chunk
text = "Some text chunk of approximately 64 tokens..."
embedding = embeddings_model.encode([text])

prediction = classifier.predict(embedding)           # 0 = human, 1 = AI
probabilities = classifier.predict_proba(embedding)  # [[p_human, p_ai]]

Full Text-Pair Classification

For classifying full text pairs (as in the competition format), the pipeline:

Chunks both texts into ~64 token segments
Generates embeddings for all chunks using the embedding model
Classifies each chunk with the SVM
Averages chunk probabilities per text
Combines scores using:

$is\\_human = \frac{(1 - P(human|t_1)) + (1 - P(LLM|t_2))}{2}$

See the full implementation in the GitHub repository.

Evaluation Results

Chunk-Level Classification (~64 token chunks)

Metric	Score
F1	~0.80
Accuracy	~0.80

Full Text-Pair Classification (with chunk averaging)

On PAN 2024 test split (with noise):

Metric	Score
ROC-AUC	0.993
Brier	0.924
C@1	0.951
F1	0.951
F0.5u	0.953
Mean	0.955

On external Kaggle AI vs Human Text dataset:

Metric	Score
ROC-AUC	0.948
Brier	0.872
C@1	0.878
F1	0.878
F0.5u	0.877
Mean	0.891

Comparison with PAN 2024 Competition Leaderboard

#	Team	ROC-AUC	Brier	C@1	F1	F0.5u	Mean
1	marsan	0.961	0.928	0.912	0.884	0.932	0.924
2	you-shun-you-de	0.931	0.926	0.928	0.905	0.913	0.921
3	baselineavengers	0.925	0.869	0.882	0.875	0.869	0.886
-	Baseline	0.751	0.780	0.734	0.720	0.720	0.741

Note: Our results are evaluated on our own test split and are not directly comparable to the official competition leaderboard.

Retraining the Classifier

If you want to retrain the SVM (e.g., with additional data), you can regenerate the embeddings from the raw data:

from sentence_transformers import SentenceTransformer
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV

# Load the embedding model
model = SentenceTransformer(
    'Alejandro-Pardo/voight-kampff-pan2024-gte-en-v1.5',
    trust_remote_code=True
)

# Generate embeddings from your data
train_embeddings = model.encode(train_texts, show_progress_bar=True, batch_size=32)
test_embeddings = model.encode(test_texts, show_progress_bar=True, batch_size=32)

# Train and calibrate
svm = LinearSVC(random_state=42, max_iter=10000)
svm.fit(train_embeddings, train_labels)

calibrated_svm = CalibratedClassifierCV(estimator=svm, cv='prefit')
calibrated_svm.fit(test_embeddings, test_labels)

The training data is available in the GitHub repository.

Limitations

Requires the embedding model: This classifier only works with embeddings from voight-kampff-pan2024-gte-en-v1.5.
English only: Trained on English text data only.
Short texts: Performance degrades on very short texts (1-2 chunks), where chunk-level accuracy (~80%) is the bottleneck.
Pickle format: Requires compatible scikit-learn version to load.
Evaluation caveat: Results are on our own test split, not the official PAN 2024 evaluation set.

Authors

Alejandro Pardo Bascuñana - Universidad Politécnica de Madrid
Pedro Amaya Moreno - Universidad Politécnica de Madrid

Developed as part of the NLP course in the Master's program Aprendizaje Automático y Datos Masivos at UPM (2024-2025).

Citation

@misc{pardo2025voightkampff,
  title={Voight-Kampff: Contrastive Embedding Learning for AI-Generated Text Detection},
  author={Pardo-Bascu{\~n}ana, Alejandro and Amaya-Moreno, Pedro},
  year={2025},
  url={https://github.com/Alejandro-Pardo/voight-kampff-pan2024/}
}

alejandroparbas
/

voight-kampff-pan2024-classifier