| --- |
| language: |
| - en |
| - es |
| - pt |
| - ca |
| - de |
| - fr |
| - it |
| license: apache-2.0 |
| tags: |
| - text-classification |
| - ai-detection |
| - multilingual |
| - deberta-v3 |
| - knowledge-distillation |
| pipeline_tag: text-classification |
| library_name: transformers |
| base_model: desklib/ai-text-detector-v1.01 |
| --- |
| |
| # Oculus 2.0 Multilingual AI Text Detector |
|
|
| A fine-tuned DeBERTa-v3-large model for detecting AI-generated text across 8 languages. |
|
|
| ## Supported Languages |
|
|
| | Language | Code | Accuracy | Recall | FPR | |
| |----------|------|----------|--------|-----| |
| | English | en | 91.0% | 89.8% | 0.2% | |
| | Spanish | es | 97.2% | 96.2% | 1.0% | |
| | Portuguese | pt | 99.0% | 100.0% | 3.0% | |
| | Catalan | ca | 99.9% | 99.8% | 0.0% | |
| | Valencian | va | 100.0% | 100.0% | 0.0% | |
| | German | de | 100.0% | 100.0% | 0.0% | |
| | French | fr | 100.0% | 100.0% | 0.0% | |
| | Italian | it | 100.0% | 100.0% | 0.0% | |
|
|
| ## Stress Test (Unseen Models) |
|
|
| 100% recall across all 8 languages on fresh text from GPT-5.4, Claude Sonnet 4, and Gemini 2.5 Pro. |
|
|
| ## Model Architecture |
|
|
| - **Base**: DeBERTa-v3-large (434M parameters) |
| - **Head**: Mean pooling + linear classifier (hidden_size to 1) |
| - **Output**: Sigmoid probability [0, 1] where 1 = AI-generated |
| |
| ## Training |
| |
| - **Method**: Knowledge distillation from GPTZero soft labels |
| - **Loss**: Composite (0.5 KL + 0.3 BCE + 0.2 MSE) |
| - **Data**: 12,886 texts across 8 languages |
| - **AI Models Represented**: GPT-5.2, GPT-5.4, GPT-4o-mini, Gemini-3-flash, DeepSeek-v3.2, Grok, Claude Opus, o4-mini |
| - **Epochs**: 5 (best at epoch 5) |
| - **Learning Rate**: 2e-6 with cosine warmup |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel |
| import torch.nn as nn |
| |
| class DesklibAIDetectionModel(PreTrainedModel): |
| config_class = AutoConfig |
| def __init__(self, config): |
| super().__init__(config) |
| self.model = AutoModel.from_config(config) |
| self.classifier = nn.Linear(config.hidden_size, 1) |
| self.init_weights() |
| |
| def forward(self, input_ids=None, attention_mask=None, **kwargs): |
| outputs = self.model(input_ids=input_ids, attention_mask=attention_mask) |
| last_hidden = outputs[0] |
| mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float() |
| sum_embeddings = torch.sum(last_hidden * mask_expanded, dim=1) |
| sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9) |
| pooled = sum_embeddings / sum_mask |
| pooled = pooled.to(self.classifier.weight.dtype) |
| logits = self.classifier(pooled) |
| return {"logits": logits} |
| |
| # Load |
| tokenizer = AutoTokenizer.from_pretrained("danibor/oculus-2.0-multilingual") |
| model = DesklibAIDetectionModel.from_pretrained("danibor/oculus-2.0-multilingual") |
|
|
| # Predict |
| text = "Your text here..." |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length") |
| with torch.no_grad(): |
| logits = model(**inputs)["logits"].squeeze(-1) |
| prob = torch.sigmoid(logits).item() |
| |
| print(f"AI probability: {prob:.4f}") |
| print(f"Classification: {'AI' if prob >= 0.5 else 'Human'}") |
| ``` |
| |
| ## Built by |
| |
| Hastewire - AI Detection Research |
| |