--- language: - en - es - pt - ca - de - fr - it license: apache-2.0 tags: - text-classification - ai-detection - multilingual - deberta-v3 - knowledge-distillation pipeline_tag: text-classification library_name: transformers base_model: desklib/ai-text-detector-v1.01 --- # Oculus 2.0 Multilingual AI Text Detector A fine-tuned DeBERTa-v3-large model for detecting AI-generated text across 8 languages. ## Supported Languages | Language | Code | Accuracy | Recall | FPR | |----------|------|----------|--------|-----| | English | en | 91.0% | 89.8% | 0.2% | | Spanish | es | 97.2% | 96.2% | 1.0% | | Portuguese | pt | 99.0% | 100.0% | 3.0% | | Catalan | ca | 99.9% | 99.8% | 0.0% | | Valencian | va | 100.0% | 100.0% | 0.0% | | German | de | 100.0% | 100.0% | 0.0% | | French | fr | 100.0% | 100.0% | 0.0% | | Italian | it | 100.0% | 100.0% | 0.0% | ## Stress Test (Unseen Models) 100% recall across all 8 languages on fresh text from GPT-5.4, Claude Sonnet 4, and Gemini 2.5 Pro. ## Model Architecture - **Base**: DeBERTa-v3-large (434M parameters) - **Head**: Mean pooling + linear classifier (hidden_size to 1) - **Output**: Sigmoid probability [0, 1] where 1 = AI-generated ## Training - **Method**: Knowledge distillation from GPTZero soft labels - **Loss**: Composite (0.5 KL + 0.3 BCE + 0.2 MSE) - **Data**: 12,886 texts across 8 languages - **AI Models Represented**: GPT-5.2, GPT-5.4, GPT-4o-mini, Gemini-3-flash, DeepSeek-v3.2, Grok, Claude Opus, o4-mini - **Epochs**: 5 (best at epoch 5) - **Learning Rate**: 2e-6 with cosine warmup ## Usage ```python import torch from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel import torch.nn as nn class DesklibAIDetectionModel(PreTrainedModel): config_class = AutoConfig def __init__(self, config): super().__init__(config) self.model = AutoModel.from_config(config) self.classifier = nn.Linear(config.hidden_size, 1) self.init_weights() def forward(self, input_ids=None, attention_mask=None, **kwargs): outputs = self.model(input_ids=input_ids, attention_mask=attention_mask) last_hidden = outputs[0] mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float() sum_embeddings = torch.sum(last_hidden * mask_expanded, dim=1) sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9) pooled = sum_embeddings / sum_mask pooled = pooled.to(self.classifier.weight.dtype) logits = self.classifier(pooled) return {"logits": logits} # Load tokenizer = AutoTokenizer.from_pretrained("danibor/oculus-2.0-multilingual") model = DesklibAIDetectionModel.from_pretrained("danibor/oculus-2.0-multilingual") # Predict text = "Your text here..." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length") with torch.no_grad(): logits = model(**inputs)["logits"].squeeze(-1) prob = torch.sigmoid(logits).item() print(f"AI probability: {prob:.4f}") print(f"Classification: {'AI' if prob >= 0.5 else 'Human'}") ``` ## Built by Hastewire - AI Detection Research