NAMAA-Space
/

AraModernBert-Topic-Classifier

 - arabic
 ---
+# ModernBERT Arabic Model Card
+## Overview
+This is an Arabic version of ModernBERT, a modernized bidirectional encoder-only Transformer model (BERT-style). ModernBERT was pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. You can find more about the base ModernBERT model here: [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
+For this proof of concept, a tokenizer trained on Arabic Wikipedia was utilized:
+- **Dataset:** Arabic Wikipedia
+- **Size:** 1.8 GB
+- **Tokens:** 228,788,529 tokens
+This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
+## Model Details
+- **Epochs:** 3
+- **Evaluation Metrics:**
+  - **F1 Score:** 0.9587811491105839
+  - **Loss:** 0.19986020028591156
+  - **Runtime:** 46.4942 seconds
+  - **Samples per second:** 305.006
+  - **Steps per second:** 38.134
+- **Training Step:** 47,862
+## How to Use
+The model can be used for text classification using the `transformers` library. Below is an example:
+```python
+from transformers import pipeline
+# Load model from huggingface.co/models using our repository ID
+classifier = pipeline(
+    task="text-classification",
+    model="ModernBERT-domain-classifier/checkpoint-47862",
+)
+sample = '''
+اسلام عددا من الوافدين الى الممكلة العربية السعوديه
+'''
+classifier(sample)
+# [{'label': 'health', 'score': 0.6779336333274841}]