DivEye XGBoost AI Detector
XGBoost classifier for AI text detection based on the DivEye method. Uses statistical features extracted from LLM embeddings to classify text as human or AI-generated.
Model Description
This model implements the DivEye detection approach, which analyzes distributional properties of text using features derived from a base language model. The XGBoost classifier is trained on these statistical features to distinguish between human and AI-generated academic text.
- Model type: XGBoost Classifier
- Language: English
- License: Apache 2.0
- Feature extractor: GPT-OSS-20B embeddings
Intended Use
This model is intended for:
- Detecting AI-generated content in academic submissions
- Research on statistical AI text detection methods
- Ensemble combination with neural detectors
Important: This model should be used as one component in a larger detection ensemble. It provides complementary signal to neural classifiers.
Performance
When used as part of the full detection ensemble:
- Provides statistical features that complement neural detectors
- Helps reduce false positives on edge cases
- Particularly effective on longer texts
Usage
import pickle
import numpy as np
# Load the model
with open("diveye_xgboost.pkl", "rb") as f:
model = pickle.load(f)
# Features should be extracted using the DivEye feature extractor
# See the full detection pipeline for feature extraction code
features = extract_diveye_features(text) # Returns numpy array
# Predict
probability = model.predict_proba(features.reshape(1, -1))[0][1]
print(f"AI Probability: {probability:.2%}")
Features
The model expects statistical features including:
- Perplexity-based metrics
- Token probability distributions
- Entropy measures
- Distributional statistics
Limitations
- Requires a compatible feature extractor (GPT-OSS-20B based)
- Best used in combination with neural detectors
- May have reduced accuracy on very short texts
- Optimized for academic/formal writing style
Citation
@misc{diveye_xgboost_detector,
author = {COAI},
title = {DivEye XGBoost AI Text Detector},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/coai/diveye-xgboost-detector}
}
Contact
For questions or issues, please open an issue on the model repository.