3zizo3 commited on
Commit
ef02e03
·
verified ·
1 Parent(s): 60b4d08

Creating model card/readme

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ar
4
+ library_name: transformers
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - arabic
8
+ - arabert
9
+ - bert
10
+ - text-classification
11
+ - safetensors
12
+ ---
13
+
14
+ # OBSIDIAN
15
+
16
+ ## Model Overview
17
+
18
+ **OBSIDIAN** is a fine-tuned AraBERT-based model for Arabic text classification.
19
+ It is designed to classify Arabic tweets and short texts into 5 categories:
20
+
21
+ - Threat
22
+ - Violence
23
+ - Distress
24
+ - Complaint
25
+ - Neutral
26
+
27
+ This model is part of the **OBSIDIAN** project, a real-time social media intelligence and threat detection system.
28
+
29
+ ## Labels
30
+
31
+ The model predicts one of the following classes:
32
+
33
+ - **Threat**: text containing direct or indirect threats or intimidation
34
+ - **Violence**: text describing physical aggression, assault, or violent incidents
35
+ - **Distress**: text expressing fear, panic, emotional suffering, or need for help
36
+ - **Complaint**: text expressing dissatisfaction, criticism, or reporting a service/problem
37
+ - **Neutral**: text without strong threat, violence, distress, or complaint signals
38
+
39
+ ## Intended Use
40
+
41
+ This model is intended for:
42
+ - Arabic tweet classification
43
+ - short Arabic text classification
44
+ - research/demo use in social media monitoring workflows
45
+
46
+ ## Limitations
47
+
48
+ - The model is intended for Arabic text only
49
+ - Performance may degrade on long texts, mixed-language text, or text very different from the training distribution
50
+ - Some difficult examples may overlap semantically, especially between distress and threat, or complaint and neutral
51
+ - This model should support human review, not replace it in high-stakes situations
52
+
53
+ ## Training / Fine-Tuning Context
54
+
55
+ This model was fine-tuned as part of the OBSIDIAN project and later integrated into a Streamlit application for:
56
+ - single-text prediction
57
+ - batch CSV/XLSX prediction
58
+ - result visualization and export
59
+
60
+ ## Usage
61
+
62
+ Example with Transformers:
63
+
64
+ ```python
65
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
+ import torch
67
+
68
+ model_id = "SoftALL/OBSIDIAN"
69
+
70
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
71
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
72
+
73
+ text = "الخدمة سيئة جدًا والتطبيق يتعطل كل مرة"
74
+
75
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
76
+
77
+ with torch.no_grad():
78
+ outputs = model(**inputs)
79
+ probs = torch.softmax(outputs.logits, dim=1)[0]
80
+ pred_id = int(torch.argmax(probs).item())
81
+
82
+ label = model.config.id2label[pred_id]
83
+ print(label)
84
+ ```
85
+
86
+ ## Files in This Repository
87
+
88
+ This model repository includes:
89
+
90
+ - `config.json`
91
+ - `model.safetensors`
92
+ - `tokenizer.json`
93
+ - `tokenizer_config.json`
94
+
95
+ ## Project Context
96
+
97
+ The full application code for the OBSIDIAN project is hosted separately in the SoftALL GitHub organization, while this Hugging Face repository hosts the model files used for inference.