Oculus 2.0 Multilingual AI Text Detector
A fine-tuned DeBERTa-v3-large model for detecting AI-generated text across 8 languages.
Supported Languages
| Language | Code | Accuracy | Recall | FPR |
|---|---|---|---|---|
| English | en | 91.0% | 89.8% | 0.2% |
| Spanish | es | 97.2% | 96.2% | 1.0% |
| Portuguese | pt | 99.0% | 100.0% | 3.0% |
| Catalan | ca | 99.9% | 99.8% | 0.0% |
| Valencian | va | 100.0% | 100.0% | 0.0% |
| German | de | 100.0% | 100.0% | 0.0% |
| French | fr | 100.0% | 100.0% | 0.0% |
| Italian | it | 100.0% | 100.0% | 0.0% |
Stress Test (Unseen Models)
100% recall across all 8 languages on fresh text from GPT-5.4, Claude Sonnet 4, and Gemini 2.5 Pro.
Model Architecture
- Base: DeBERTa-v3-large (434M parameters)
- Head: Mean pooling + linear classifier (hidden_size to 1)
- Output: Sigmoid probability [0, 1] where 1 = AI-generated
Training
- Method: Knowledge distillation from GPTZero soft labels
- Loss: Composite (0.5 KL + 0.3 BCE + 0.2 MSE)
- Data: 12,886 texts across 8 languages
- AI Models Represented: GPT-5.2, GPT-5.4, GPT-4o-mini, Gemini-3-flash, DeepSeek-v3.2, Grok, Claude Opus, o4-mini
- Epochs: 5 (best at epoch 5)
- Learning Rate: 2e-6 with cosine warmup
Usage
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel
import torch.nn as nn
class DesklibAIDetectionModel(PreTrainedModel):
config_class = AutoConfig
def __init__(self, config):
super().__init__(config)
self.model = AutoModel.from_config(config)
self.classifier = nn.Linear(config.hidden_size, 1)
self.init_weights()
def forward(self, input_ids=None, attention_mask=None, **kwargs):
outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
last_hidden = outputs[0]
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float()
sum_embeddings = torch.sum(last_hidden * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
pooled = sum_embeddings / sum_mask
pooled = pooled.to(self.classifier.weight.dtype)
logits = self.classifier(pooled)
return {"logits": logits}
# Load
tokenizer = AutoTokenizer.from_pretrained("danibor/oculus-2.0-multilingual")
model = DesklibAIDetectionModel.from_pretrained("danibor/oculus-2.0-multilingual")
# Predict
text = "Your text here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
with torch.no_grad():
logits = model(**inputs)["logits"].squeeze(-1)
prob = torch.sigmoid(logits).item()
print(f"AI probability: {prob:.4f}")
print(f"Classification: {'AI' if prob >= 0.5 else 'Human'}")
Built by
Hastewire - AI Detection Research
- Downloads last month
- 15
Model tree for danibor/oculus-v2.0-multilingual
Base model
microsoft/deberta-v3-large Finetuned
desklib/ai-text-detector-v1.01