akhooli
/

setfit_ar_hs

@@ -10,12 +10,15 @@ tags:
 - text-classification
 - generated_from_setfit_trainer
 widget:
-- text: 'دغري بدكم تفوتو بخصوصيات الناس طيب ما اموال كتار معروفة و مش معروفة منوين
-    جابتهن بتفتح... '
-- text: ايها السادة العرب الوزير جبران باسيل يتكلم باسمه الشخصي
-- text: 'وكل مين بدو يشد على مشدو '
-- text: لازم جائزة نوبل للكيميا ياخدها دكتاتور البعث الفاشي
-- text: 'زرع شعراته ولوووووو فيهن    '
 inference: true
 model-index:
 - name: SetFit with akhooli/sbert_ar_nli_500k_norm
@@ -29,39 +32,14 @@ model-index:
       split: test
     metrics:
     - type: accuracy
-      value: 0.8506944444444444
       name: Accuracy
 ---
 # SetFit with akhooli/sbert_ar_nli_500k_norm
-This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification.
-This SetFit model uses [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm) as the Sentence Transformer embedding model.
-A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
-This model is trained with few shots using the [akhooli/ar_hs](https://huggingface.co/datasets/akhooli/ar_hs) dataset. The dataset uses LLM to generate labels.
- Usage:
- ```python
-pip install setfit
-from setfit import SetFitModel
-from unicodedata import normalize
-# Download model from Hub
-model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
-# Run inference
-queries = [
-        "سكت دهراً و نطق كفراً",
-        "الخلاف ﻻ يفسد للود قضية.",
-        "أنت شخص منبوذ. احترم أسيادك.",
-        "دع المكارم ﻻ ترحل لبغيتها واقعد فإنك أنت الطاعم الكاسي",
-    ]
-queries_n = [normalize('NFKC', query) for query in queries]
-preds = model.predict(queries_n)
-print(preds)
-# if you want to see the probabilities for each label
-probas = model.predict_proba(queries_n)
-print(probas)
-```
-The rest of this card is auto generated.
 The model has been trained using an efficient few-shot learning technique that involves:
 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
@@ -74,7 +52,7 @@ The model has been trained using an efficient few-shot learning technique that i
 - **Sentence Transformer body:** [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm)
 - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
 - **Maximum Sequence Length:** 512 tokens
-- **Number of Classes:** 3 classes
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
@@ -85,12 +63,18 @@ The model has been trained using an efficient few-shot learning technique that i
 - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
 ## Evaluation
 ### Metrics
 | Label   | Accuracy |
 |:--------|:---------|
-| **all** | 0.8507   |
 ## Uses
@@ -110,7 +94,7 @@ from setfit import SetFitModel
 # Download from the 🤗 Hub
 model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
 # Run inference
-preds = model("وكل مين بدو يشد على مشدو ")
 ```
 <!--
@@ -140,9 +124,9 @@ preds = model("وكل مين بدو يشد على مشدو ")
 ## Training Details
 ### Training Set Metrics
-| Training set | Min | Median  | Max |
-|:-------------|:----|:--------|:----|
-| Word count   | 1   | 12.7668 | 52  |
 | Label    | Training Sample Count |
 |:---------|:----------------------|
@@ -164,64 +148,64 @@ preds = model("وكل مين بدو يشد على مشدو ")
 - warmup_proportion: 0.1
 - l2_weight: 0.01
 - seed: 42
-- run_name: setfit_hate_2k
 - eval_max_steps: -1
 - load_best_model_at_end: False
 ### Training Results
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
-| 0.0004 | 1    | 0.3158        | -               |
-| 0.04   | 100  | 0.2783        | -               |
-| 0.08   | 200  | 0.2427        | -               |
-| 0.12   | 300  | 0.1803        | -               |
-| 0.16   | 400  | 0.1334        | -               |
-| 0.2    | 500  | 0.0846        | -               |
-| 0.24   | 600  | 0.0638        | -               |
-| 0.28   | 700  | 0.05          | -               |
-| 0.32   | 800  | 0.0412        | -               |
-| 0.36   | 900  | 0.0345        | -               |
-| 0.4    | 1000 | 0.0291        | -               |
-| 0.44   | 1100 | 0.0232        | -               |
-| 0.48   | 1200 | 0.0207        | -               |
-| 0.52   | 1300 | 0.0177        | -               |
-| 0.56   | 1400 | 0.018         | -               |
-| 0.6    | 1500 | 0.0141        | -               |
-| 0.64   | 1600 | 0.017         | -               |
-| 0.68   | 1700 | 0.0133        | -               |
-| 0.72   | 1800 | 0.014         | -               |
-| 0.76   | 1900 | 0.0128        | -               |
-| 0.8    | 2000 | 0.013         | -               |
-| 0.84   | 2100 | 0.0139        | -               |
-| 0.88   | 2200 | 0.0132        | -               |
-| 0.92   | 2300 | 0.0105        | -               |
-| 0.96   | 2400 | 0.008         | -               |
-| 1.0    | 2500 | 0.0068        | -               |
-| 1.04   | 2600 | 0.0056        | -               |
-| 1.08   | 2700 | 0.0072        | -               |
-| 1.12   | 2800 | 0.0038        | -               |
-| 1.16   | 2900 | 0.005         | -               |
 | 1.2    | 3000 | 0.0039        | -               |
-| 1.24   | 3100 | 0.0034        | -               |
-| 1.28   | 3200 | 0.0035        | -               |
-| 1.32   | 3300 | 0.0038        | -               |
-| 1.3600 | 3400 | 0.0038        | -               |
-| 1.4    | 3500 | 0.0025        | -               |
-| 1.44   | 3600 | 0.0045        | -               |
-| 1.48   | 3700 | 0.003         | -               |
-| 1.52   | 3800 | 0.0025        | -               |
-| 1.56   | 3900 | 0.003         | -               |
-| 1.6    | 4000 | 0.0026        | -               |
-| 1.6400 | 4100 | 0.0029        | -               |
-| 1.6800 | 4200 | 0.0021        | -               |
-| 1.72   | 4300 | 0.003         | -               |
-| 1.76   | 4400 | 0.0025        | -               |
-| 1.8    | 4500 | 0.0032        | -               |
-| 1.8400 | 4600 | 0.002         | -               |
-| 1.88   | 4700 | 0.0024        | -               |
-| 1.92   | 4800 | 0.0022        | -               |
-| 1.96   | 4900 | 0.0024        | -               |
-| 2.0    | 5000 | 0.0027        | -               |
 ### Framework Versions
 - Python: 3.10.14

 - text-classification
 - generated_from_setfit_trainer
 widget:
+- text: عزيزي جبران باسيل بدك تعرف كتييير منيح انو مش شغلتنا نحفظ امن اسرائيل يلي
+    ما منعترف ولن نعترف ب وجودها ابدا
+- text: 'يجب على هؤلاك المجرمون الارهابيون وكل من دس فتنة انا يتحاسبو حساب مؤلم لكن
+    سؤال من سيحاسبهن '
+- text: شيل عينك عن لبنان انت و كل كلب متلك حكايتك و غير هيك انشالله بتنباع بالعزى
+- text: لسه بصرعوا طيزنا بدكن نصير متل العراق وليبيا يا حمير تجاوزناهن بأشواط، هلق
+    لو نصير متل العراق وليبيا تحسن كبير جدا
+- text: كول هوا خسرتو بأرضك وبين جمهورك بعد ما منعت القطريين من تشجيع جمهورهم انتو
+    فاشلين في كل شئ وهم متفوقين عليكم في...
 inference: true
 model-index:
 - name: SetFit with akhooli/sbert_ar_nli_500k_norm
       split: test
     metrics:
     - type: accuracy
+      value: 0.8452520515826495
       name: Accuracy
 ---
 # SetFit with akhooli/sbert_ar_nli_500k_norm
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
 The model has been trained using an efficient few-shot learning technique that involves:
 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
 - **Sentence Transformer body:** [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm)
 - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
 - **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 2 classes
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
 - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+### Model Labels
+| Label    | Examples                                                                                                                                                                                                                                                                   |
+|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| negative | <ul><li>'يا ريت بيمنعوا الأرغيلة بلبنان، لأن غير هيك ما منعمل ثورة '</li><li>'أصلا جبران عندو طيارة وعندو قصر بأوروبا ومحيط الهادىء الى اسهم فيه وتم اكتشاف كوكب جديد مثل زحل وجوبيتير تم شرائه ك...'</li><li>'اكره البرازيل بس لا تقوليلي خلاص كلشي انتهى بليز'</li></ul> |
+| positive | <ul><li>'السيد والرئيس وليش عم تشددددد دخلك كل حجمك أرنب عند معلمك بالقرداحة'</li><li>'العوني اذا تمدن متل الجحش اذا تكدن بعمرك شفت عوني بيفهم'</li><li>'لا بس الوطن بدو تكنيس من ل متلك '</li></ul>                                                                       |
 ## Evaluation
 ### Metrics
 | Label   | Accuracy |
 |:--------|:---------|
+| **all** | 0.8453   |
 ## Uses
 # Download from the 🤗 Hub
 model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
 # Run inference
+preds = model("شيل عينك عن لبنان انت و كل كلب متلك حكايتك و غير هيك انشالله بتنباع بالعزى")
 ```
 <!--
 ## Training Details
 ### Training Set Metrics
+| Training set | Min | Median | Max |
+|:-------------|:----|:-------|:----|
+| Word count   | 1   | 12.809 | 52  |
 | Label    | Training Sample Count |
 |:---------|:----------------------|
 - warmup_proportion: 0.1
 - l2_weight: 0.01
 - seed: 42
+- run_name: setfit_hate_2kv
 - eval_max_steps: -1
 - load_best_model_at_end: False
 ### Training Results
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
+| 0.0004 | 1    | 0.3239        | -               |
+| 0.04   | 100  | 0.277         | -               |
+| 0.08   | 200  | 0.2406        | -               |
+| 0.12   | 300  | 0.1737        | -               |
+| 0.16   | 400  | 0.1259        | -               |
+| 0.2    | 500  | 0.0701        | -               |
+| 0.24   | 600  | 0.0473        | -               |
+| 0.28   | 700  | 0.0298        | -               |
+| 0.32   | 800  | 0.0239        | -               |
+| 0.36   | 900  | 0.02          | -               |
+| 0.4    | 1000 | 0.0151        | -               |
+| 0.44   | 1100 | 0.0143        | -               |
+| 0.48   | 1200 | 0.0126        | -               |
+| 0.52   | 1300 | 0.0121        | -               |
+| 0.56   | 1400 | 0.0078        | -               |
+| 0.6    | 1500 | 0.0111        | -               |
+| 0.64   | 1600 | 0.0099        | -               |
+| 0.68   | 1700 | 0.0091        | -               |
+| 0.72   | 1800 | 0.0064        | -               |
+| 0.76   | 1900 | 0.0101        | -               |
+| 0.8    | 2000 | 0.0073        | -               |
+| 0.84   | 2100 | 0.0042        | -               |
+| 0.88   | 2200 | 0.0038        | -               |
+| 0.92   | 2300 | 0.0058        | -               |
+| 0.96   | 2400 | 0.0041        | -               |
+| 1.0    | 2500 | 0.0026        | -               |
+| 1.04   | 2600 | 0.0037        | -               |
+| 1.08   | 2700 | 0.0035        | -               |
+| 1.12   | 2800 | 0.0045        | -               |
+| 1.16   | 2900 | 0.0038        | -               |
 | 1.2    | 3000 | 0.0039        | -               |
+| 1.24   | 3100 | 0.0018        | -               |
+| 1.28   | 3200 | 0.003         | -               |
+| 1.32   | 3300 | 0.0028        | -               |
+| 1.3600 | 3400 | 0.0023        | -               |
+| 1.4    | 3500 | 0.0022        | -               |
+| 1.44   | 3600 | 0.0032        | -               |
+| 1.48   | 3700 | 0.0028        | -               |
+| 1.52   | 3800 | 0.0022        | -               |
+| 1.56   | 3900 | 0.0024        | -               |
+| 1.6    | 4000 | 0.0021        | -               |
+| 1.6400 | 4100 | 0.0032        | -               |
+| 1.6800 | 4200 | 0.0026        | -               |
+| 1.72   | 4300 | 0.0025        | -               |
+| 1.76   | 4400 | 0.003         | -               |
+| 1.8    | 4500 | 0.0028        | -               |
+| 1.8400 | 4600 | 0.003         | -               |
+| 1.88   | 4700 | 0.0028        | -               |
+| 1.92   | 4800 | 0.0033        | -               |
+| 1.96   | 4900 | 0.0019        | -               |
+| 2.0    | 5000 | 0.0023        | -               |
 ### Framework Versions
 - Python: 3.10.14

config_setfit.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
-  "normalize_embeddings": false,
   "labels": [
     "negative",
     "positive"
-  ]
 }

 {
   "labels": [
     "negative",
     "positive"
+  ],
+  "normalize_embeddings": false
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0d2ebdcd4940d5fd3e47d78fc0ab371baa15d3c351cb253ce4aa9ac613e917da
 size 540795752

 version https://git-lfs.github.com/spec/v1
+oid sha256:aa207876d4a89ac428c7260c57c75272051dfb17bbf88ee51b56bc87c54f9a67
 size 540795752

model_head.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:39692e6033811b7b9a9fd4c86cdf8015f4ce4af1b7b9f4c901c285fd8465a904
-size 19327

 version https://git-lfs.github.com/spec/v1
+oid sha256:49f3e09533da336510f66c9419d4d76468ed0ad3e8378107f08645838e801645
+size 7007