Upload Oculus 2.0 Multilingual AI Detector (EN/ES/PT/CA/VA/DE/FR/IT)

e6b41a2 verified 12 days ago

3.15 kB

	---
	language:
	- en
	- es
	- pt
	- ca
	- de
	- fr
	- it
	license: apache-2.0
	tags:
	- text-classification
	- ai-detection
	- multilingual
	- deberta-v3
	- knowledge-distillation
	pipeline_tag: text-classification
	library_name: transformers
	base_model: desklib/ai-text-detector-v1.01
	---

	# Oculus 2.0 Multilingual AI Text Detector

	A fine-tuned DeBERTa-v3-large model for detecting AI-generated text across 8 languages.

	## Supported Languages

	\| Language \| Code \| Accuracy \| Recall \| FPR \|
	\|----------\|------\|----------\|--------\|-----\|
	\| English \| en \| 91.0% \| 89.8% \| 0.2% \|
	\| Spanish \| es \| 97.2% \| 96.2% \| 1.0% \|
	\| Portuguese \| pt \| 99.0% \| 100.0% \| 3.0% \|
	\| Catalan \| ca \| 99.9% \| 99.8% \| 0.0% \|
	\| Valencian \| va \| 100.0% \| 100.0% \| 0.0% \|
	\| German \| de \| 100.0% \| 100.0% \| 0.0% \|
	\| French \| fr \| 100.0% \| 100.0% \| 0.0% \|
	\| Italian \| it \| 100.0% \| 100.0% \| 0.0% \|

	## Stress Test (Unseen Models)

	100% recall across all 8 languages on fresh text from GPT-5.4, Claude Sonnet 4, and Gemini 2.5 Pro.

	## Model Architecture

	- Base: DeBERTa-v3-large (434M parameters)
	- Head: Mean pooling + linear classifier (hidden_size to 1)
	- Output: Sigmoid probability [0, 1] where 1 = AI-generated

	## Training

	- Method: Knowledge distillation from GPTZero soft labels
	- Loss: Composite (0.5 KL + 0.3 BCE + 0.2 MSE)
	- Data: 12,886 texts across 8 languages
	- AI Models Represented: GPT-5.2, GPT-5.4, GPT-4o-mini, Gemini-3-flash, DeepSeek-v3.2, Grok, Claude Opus, o4-mini
	- Epochs: 5 (best at epoch 5)
	- Learning Rate: 2e-6 with cosine warmup

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel
	import torch.nn as nn

	class DesklibAIDetectionModel(PreTrainedModel):
	config_class = AutoConfig
	def __init__(self, config):
	super().__init__(config)
	self.model = AutoModel.from_config(config)
	self.classifier = nn.Linear(config.hidden_size, 1)
	self.init_weights()

	def forward(self, input_ids=None, attention_mask=None, **kwargs):
	outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
	last_hidden = outputs[0]
	mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float()
	sum_embeddings = torch.sum(last_hidden * mask_expanded, dim=1)
	sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
	pooled = sum_embeddings / sum_mask
	pooled = pooled.to(self.classifier.weight.dtype)
	logits = self.classifier(pooled)
	return {"logits": logits}

	# Load
	tokenizer = AutoTokenizer.from_pretrained("danibor/oculus-2.0-multilingual")
	model = DesklibAIDetectionModel.from_pretrained("danibor/oculus-2.0-multilingual")

	# Predict
	text = "Your text here..."
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
	with torch.no_grad():
	logits = model(**inputs)["logits"].squeeze(-1)
	prob = torch.sigmoid(logits).item()

	print(f"AI probability: {prob:.4f}")
	print(f"Classification: {'AI' if prob >= 0.5 else 'Human'}")
	```

	## Built by

	Hastewire - AI Detection Research