voight-kampff-pan2024-classifier

A calibrated Linear SVM classifier for AI-generated text detection, built for the Voight-Kampff Generative AI Shared Task (PAN @ CLEF 2024).

This classifier operates on embeddings produced by Alejandro-Pardo/voight-kampff-pan2024-gte-en-v1.5. Given an embedding of a text chunk (~64 tokens), it classifies it as human-written (0) or AI-generated (1) with calibrated probability scores.

Model Details

Parameter Value
Algorithm LinearSVC (scikit-learn)
Calibration CalibratedClassifierCV with cv='prefit'
Kernel Linear
Input Embeddings from voight-kampff-pan2024-gte-en-v1.5
Output Calibrated probabilities for human (0) and AI (1)
Format Python pickle (.pkl)

Training Data

The SVM was trained on embeddings generated from:

  • PAN 2024 competition dataset + Ollama-augmented texts (llama 3.2 1b, qwen 2.5 1b, gemma 2 2b)
  • Combined train + validation sets for training, test set for calibration
  • 18 different LLM sources (GPT-3.5, GPT-4, LLaMA, Mistral, Alpaca, Qwen, Gemma, etc.)
  • Text chunks of ~64 tokens with 15% leetspeak noise injection

Note: The pre-computed embeddings used for training are not included. They can be regenerated using the embedding model and the training data available in the GitHub repository.

Usage

Quick Start

from sentence_transformers import SentenceTransformer
import pickle

# 1. Load the embedding model
embeddings_model = SentenceTransformer(
    'alejandroparbas/voight-kampff-pan2024-gte-en-v1.5',
    trust_remote_code=True
)

# 2. Load the SVM classifier
with open('nlp_classifier_20.pkl', 'rb') as f:
    classifier = pickle.load(f)

# 3. Classify a text chunk
text = "Some text chunk of approximately 64 tokens..."
embedding = embeddings_model.encode([text])

prediction = classifier.predict(embedding)           # 0 = human, 1 = AI
probabilities = classifier.predict_proba(embedding)  # [[p_human, p_ai]]

Full Text-Pair Classification

For classifying full text pairs (as in the competition format), the pipeline:

  1. Chunks both texts into ~64 token segments
  2. Generates embeddings for all chunks using the embedding model
  3. Classifies each chunk with the SVM
  4. Averages chunk probabilities per text
  5. Combines scores using:

ishuman=(1P(humant1))+(1P(LLMt2))2is\\_human = \frac{(1 - P(human|t_1)) + (1 - P(LLM|t_2))}{2}

See the full implementation in the GitHub repository.

Evaluation Results

Chunk-Level Classification (~64 token chunks)

Metric Score
F1 ~0.80
Accuracy ~0.80

Full Text-Pair Classification (with chunk averaging)

On PAN 2024 test split (with noise):

Metric Score
ROC-AUC 0.993
Brier 0.924
C@1 0.951
F1 0.951
F0.5u 0.953
Mean 0.955

On external Kaggle AI vs Human Text dataset:

Metric Score
ROC-AUC 0.948
Brier 0.872
C@1 0.878
F1 0.878
F0.5u 0.877
Mean 0.891

Comparison with PAN 2024 Competition Leaderboard

# Team ROC-AUC Brier C@1 F1 F0.5u Mean
1 marsan 0.961 0.928 0.912 0.884 0.932 0.924
2 you-shun-you-de 0.931 0.926 0.928 0.905 0.913 0.921
3 baselineavengers 0.925 0.869 0.882 0.875 0.869 0.886
- Baseline 0.751 0.780 0.734 0.720 0.720 0.741

Note: Our results are evaluated on our own test split and are not directly comparable to the official competition leaderboard.

Retraining the Classifier

If you want to retrain the SVM (e.g., with additional data), you can regenerate the embeddings from the raw data:

from sentence_transformers import SentenceTransformer
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV

# Load the embedding model
model = SentenceTransformer(
    'Alejandro-Pardo/voight-kampff-pan2024-gte-en-v1.5',
    trust_remote_code=True
)

# Generate embeddings from your data
train_embeddings = model.encode(train_texts, show_progress_bar=True, batch_size=32)
test_embeddings = model.encode(test_texts, show_progress_bar=True, batch_size=32)

# Train and calibrate
svm = LinearSVC(random_state=42, max_iter=10000)
svm.fit(train_embeddings, train_labels)

calibrated_svm = CalibratedClassifierCV(estimator=svm, cv='prefit')
calibrated_svm.fit(test_embeddings, test_labels)

The training data is available in the GitHub repository.

Limitations

  • Requires the embedding model: This classifier only works with embeddings from voight-kampff-pan2024-gte-en-v1.5.
  • English only: Trained on English text data only.
  • Short texts: Performance degrades on very short texts (1-2 chunks), where chunk-level accuracy (~80%) is the bottleneck.
  • Pickle format: Requires compatible scikit-learn version to load.
  • Evaluation caveat: Results are on our own test split, not the official PAN 2024 evaluation set.

Authors

  • Alejandro Pardo Bascuñana - Universidad Politécnica de Madrid
  • Pedro Amaya Moreno - Universidad Politécnica de Madrid

Developed as part of the NLP course in the Master's program Aprendizaje Automático y Datos Masivos at UPM (2024-2025).

Citation

@misc{pardo2025voightkampff,
  title={Voight-Kampff: Contrastive Embedding Learning for AI-Generated Text Detection},
  author={Pardo-Bascu{\~n}ana, Alejandro and Amaya-Moreno, Pedro},
  year={2025},
  url={https://github.com/Alejandro-Pardo/voight-kampff-pan2024/}
}

Links

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support