SoftALL
/

OBSIDIAN

+---
+language:
+- ar
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- arabic
+- arabert
+- bert
+- text-classification
+- safetensors
+---
+# OBSIDIAN
+## Model Overview
+**OBSIDIAN** is a fine-tuned AraBERT-based model for Arabic text classification.
+It is designed to classify Arabic tweets and short texts into 5 categories:
+- Threat
+- Violence
+- Distress
+- Complaint
+- Neutral
+This model is part of the **OBSIDIAN** project, a real-time social media intelligence and threat detection system.
+## Labels
+The model predicts one of the following classes:
+- **Threat**: text containing direct or indirect threats or intimidation
+- **Violence**: text describing physical aggression, assault, or violent incidents
+- **Distress**: text expressing fear, panic, emotional suffering, or need for help
+- **Complaint**: text expressing dissatisfaction, criticism, or reporting a service/problem
+- **Neutral**: text without strong threat, violence, distress, or complaint signals
+## Intended Use
+This model is intended for:
+- Arabic tweet classification
+- short Arabic text classification
+- research/demo use in social media monitoring workflows
+## Limitations
+- The model is intended for Arabic text only
+- Performance may degrade on long texts, mixed-language text, or text very different from the training distribution
+- Some difficult examples may overlap semantically, especially between distress and threat, or complaint and neutral
+- This model should support human review, not replace it in high-stakes situations
+## Training / Fine-Tuning Context
+This model was fine-tuned as part of the OBSIDIAN project and later integrated into a Streamlit application for:
+- single-text prediction
+- batch CSV/XLSX prediction
+- result visualization and export
+## Usage
+Example with Transformers:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "SoftALL/OBSIDIAN"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = "الخدمة سيئة جدًا والتطبيق يتعطل كل مرة"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.softmax(outputs.logits, dim=1)[0]
+    pred_id = int(torch.argmax(probs).item())
+label = model.config.id2label[pred_id]
+print(label)
+```
+## Files in This Repository
+This model repository includes:
+- `config.json`
+- `model.safetensors`
+- `tokenizer.json`
+- `tokenizer_config.json`
+## Project Context
+The full application code for the OBSIDIAN project is hosted separately in the SoftALL GitHub organization, while this Hugging Face repository hosts the model files used for inference.