You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DivEye XGBoost AI Detector

XGBoost classifier for AI text detection based on the DivEye method. Uses statistical features extracted from LLM embeddings to classify text as human or AI-generated.

Model Description

This model implements the DivEye detection approach, which analyzes distributional properties of text using features derived from a base language model. The XGBoost classifier is trained on these statistical features to distinguish between human and AI-generated academic text.

  • Model type: XGBoost Classifier
  • Language: English
  • License: Apache 2.0
  • Feature extractor: GPT-OSS-20B embeddings

Intended Use

This model is intended for:

  • Detecting AI-generated content in academic submissions
  • Research on statistical AI text detection methods
  • Ensemble combination with neural detectors

Important: This model should be used as one component in a larger detection ensemble. It provides complementary signal to neural classifiers.

Performance

When used as part of the full detection ensemble:

  • Provides statistical features that complement neural detectors
  • Helps reduce false positives on edge cases
  • Particularly effective on longer texts

Usage

import pickle
import numpy as np

# Load the model
with open("diveye_xgboost.pkl", "rb") as f:
    model = pickle.load(f)

# Features should be extracted using the DivEye feature extractor
# See the full detection pipeline for feature extraction code
features = extract_diveye_features(text)  # Returns numpy array

# Predict
probability = model.predict_proba(features.reshape(1, -1))[0][1]
print(f"AI Probability: {probability:.2%}")

Features

The model expects statistical features including:

  • Perplexity-based metrics
  • Token probability distributions
  • Entropy measures
  • Distributional statistics

Limitations

  • Requires a compatible feature extractor (GPT-OSS-20B based)
  • Best used in combination with neural detectors
  • May have reduced accuracy on very short texts
  • Optimized for academic/formal writing style

Citation

@misc{diveye_xgboost_detector,
  author = {COAI},
  title = {DivEye XGBoost AI Text Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/coai/diveye-xgboost-detector}
}

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support