danibor's picture
Upload Oculus 2.0 Multilingual AI Detector (EN/ES/PT/CA/VA/DE/FR/IT)
e6b41a2 verified
metadata
language:
  - en
  - es
  - pt
  - ca
  - de
  - fr
  - it
license: apache-2.0
tags:
  - text-classification
  - ai-detection
  - multilingual
  - deberta-v3
  - knowledge-distillation
pipeline_tag: text-classification
library_name: transformers
base_model: desklib/ai-text-detector-v1.01

Oculus 2.0 Multilingual AI Text Detector

A fine-tuned DeBERTa-v3-large model for detecting AI-generated text across 8 languages.

Supported Languages

Language Code Accuracy Recall FPR
English en 91.0% 89.8% 0.2%
Spanish es 97.2% 96.2% 1.0%
Portuguese pt 99.0% 100.0% 3.0%
Catalan ca 99.9% 99.8% 0.0%
Valencian va 100.0% 100.0% 0.0%
German de 100.0% 100.0% 0.0%
French fr 100.0% 100.0% 0.0%
Italian it 100.0% 100.0% 0.0%

Stress Test (Unseen Models)

100% recall across all 8 languages on fresh text from GPT-5.4, Claude Sonnet 4, and Gemini 2.5 Pro.

Model Architecture

  • Base: DeBERTa-v3-large (434M parameters)
  • Head: Mean pooling + linear classifier (hidden_size to 1)
  • Output: Sigmoid probability [0, 1] where 1 = AI-generated

Training

  • Method: Knowledge distillation from GPTZero soft labels
  • Loss: Composite (0.5 KL + 0.3 BCE + 0.2 MSE)
  • Data: 12,886 texts across 8 languages
  • AI Models Represented: GPT-5.2, GPT-5.4, GPT-4o-mini, Gemini-3-flash, DeepSeek-v3.2, Grok, Claude Opus, o4-mini
  • Epochs: 5 (best at epoch 5)
  • Learning Rate: 2e-6 with cosine warmup

Usage

import torch
from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel
import torch.nn as nn

class DesklibAIDetectionModel(PreTrainedModel):
    config_class = AutoConfig
    def __init__(self, config):
        super().__init__(config)
        self.model = AutoModel.from_config(config)
        self.classifier = nn.Linear(config.hidden_size, 1)
        self.init_weights()

    def forward(self, input_ids=None, attention_mask=None, **kwargs):
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
        last_hidden = outputs[0]
        mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float()
        sum_embeddings = torch.sum(last_hidden * mask_expanded, dim=1)
        sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
        pooled = sum_embeddings / sum_mask
        pooled = pooled.to(self.classifier.weight.dtype)
        logits = self.classifier(pooled)
        return {"logits": logits}

# Load
tokenizer = AutoTokenizer.from_pretrained("danibor/oculus-2.0-multilingual")
model = DesklibAIDetectionModel.from_pretrained("danibor/oculus-2.0-multilingual")

# Predict
text = "Your text here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
with torch.no_grad():
    logits = model(**inputs)["logits"].squeeze(-1)
    prob = torch.sigmoid(logits).item()

print(f"AI probability: {prob:.4f}")
print(f"Classification: {'AI' if prob >= 0.5 else 'Human'}")

Built by

Hastewire - AI Detection Research