baobabtech
/

water-conflict-classifier

@@ -1,65 +1,201 @@
 ---
-license: mit
-library_name: setfit
 tags:
 - setfit
 - sentence-transformers
 - text-classification
-- multi-label
-- water-conflict
 metrics:
-- f1
 - accuracy
-language:
-- en
 ---
-# Water Conflict Multi-Label Classifier
-This model classifies news headlines about water-related conflicts into three categories:
-- **Trigger**: Water resource as a conflict trigger
-- **Casualty**: Water infrastructure as a casualty/target
-- **Weapon**: Water used as a weapon/tool
 ## Model Details
-- **Base Model**: BAAI/bge-small-en-v1.5
-- **Architecture**: SetFit with One-vs-Rest multi-label strategy
-- **Training Approach**: Few-shot learning optimized (SetFit reaches peak performance with small samples)
-- **Training Data**: 510 examples (sampled from ~5,000 labeled headlines)
-- **Performance**: F1 (micro) = 0.8319, Accuracy = 0.8333
-## Usage
 ```python
 from setfit import SetFitModel
 model = SetFitModel.from_pretrained("baobabtech/water-conflict-classifier")
-headlines = [
-    "Taliban attack workers at the Kajaki Dam in Afghanistan",
-    "New water treatment plant opens in California"
-]
-predictions = model.predict(headlines)
-print(predictions)
-```
-## Training Metrics
-- Accuracy (exact match): 0.8333
-- F1 (micro): 0.8319
-- F1 (macro): 0.6755
-- Hamming Loss: 0.0704
-## Label Distribution
-| Label | F1 Score | Support |
-|-------|----------|---------|
-| Trigger | 0.8837 | 21 |
-| Casualty | 0.8571 | 30 |
-| Weapon | 0.2857 | 5 |
 ## Citation
-Based on ACLED (Armed Conflict Location & Event Data Project) data.

 ---
 tags:
 - setfit
 - sentence-transformers
 - text-classification
+- generated_from_setfit_trainer
+widget:
+- text: Gaddafi cuts of water to Libya's capital
+- text: Grenade blast in water tank leaves 40 families without water in Potrerito,
+    Valle del Cauca, Colombia
+- text: Silvan Dam construction site attacked
+- text: in the afternoon, US forces destroy (likely through airstrikes) 2 suspected
+    Houthi patrol boats in an unidentified area in the South Red Sea while Houthi
+    media reported 3 air raids on As Salif coastal district (coded to As Salif Port)
+    (Al Hudaydah). Casaulties unknown.
+- text: a group of Fulani men clashed with and killed a suspected Fulani bull thief
+    in the Goure Kele district of Sakabansi (Nikki, Borgou). He was found dead in
+    his house after being struck with a machete during the clash by one of the members
+    of the group, who then fled.
 metrics:
 - accuracy
+pipeline_tag: text-classification
+library_name: setfit
+inference: false
+base_model: BAAI/bge-small-en-v1.5
 ---
+# SetFit with BAAI/bge-small-en-v1.5
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) as the Sentence Transformer embedding model. A OneVsRestClassifier instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
 ## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
+- **Classification head:** a OneVsRestClassifier instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 3 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
 ```python
 from setfit import SetFitModel
+# Download from the 🤗 Hub
 model = SetFitModel.from_pretrained("baobabtech/water-conflict-classifier")
+# Run inference
+preds = model("Silvan Dam construction site attacked")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median  | Max |
+|:-------------|:----|:--------|:----|
+| Word count   | 4   | 25.9533 | 236 |
+### Training Hyperparameters
+- batch_size: (32, 32)
+- num_epochs: (1, 1)
+- max_steps: -1
+- sampling_strategy: undersampling
+- body_learning_rate: (2e-05, 1e-05)
+- head_learning_rate: 0.01
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.1
+- l2_weight: 0.01
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: True
+### Training Results
+| Epoch  | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.0007 | 1    | 0.2168        | -               |
+| 0.0339 | 50   | 0.2108        | -               |
+| 0.0679 | 100  | 0.1126        | -               |
+| 0.1018 | 150  | 0.0719        | -               |
+| 0.1358 | 200  | 0.0616        | -               |
+| 0.1697 | 250  | 0.0518        | -               |
+| 0.2037 | 300  | 0.0454        | -               |
+| 0.2376 | 350  | 0.0393        | -               |
+| 0.2716 | 400  | 0.0324        | -               |
+| 0.3055 | 450  | 0.0265        | -               |
+| 0.3394 | 500  | 0.0279        | -               |
+| 0.3734 | 550  | 0.0231        | -               |
+| 0.4073 | 600  | 0.0231        | -               |
+| 0.4413 | 650  | 0.0228        | -               |
+| 0.4752 | 700  | 0.0272        | -               |
+| 0.5092 | 750  | 0.0216        | -               |
+| 0.5431 | 800  | 0.0186        | -               |
+| 0.5771 | 850  | 0.0195        | -               |
+| 0.6110 | 900  | 0.0174        | -               |
+| 0.6449 | 950  | 0.0163        | -               |
+| 0.6789 | 1000 | 0.0174        | -               |
+| 0.7128 | 1050 | 0.0148        | -               |
+| 0.7468 | 1100 | 0.0167        | -               |
+| 0.7807 | 1150 | 0.0158        | -               |
+| 0.8147 | 1200 | 0.0146        | -               |
+| 0.8486 | 1250 | 0.0146        | -               |
+| 0.8826 | 1300 | 0.0145        | -               |
+| 0.9165 | 1350 | 0.0138        | -               |
+| 0.9504 | 1400 | 0.0142        | -               |
+| 0.9844 | 1450 | 0.013         | -               |
+| 1.0    | 1473 | -             | 0.0577          |
+### Framework Versions
+- Python: 3.12.12
+- SetFit: 1.1.3
+- Sentence Transformers: 5.1.2
+- Transformers: 4.57.3
+- PyTorch: 2.9.1+cu128
+- Datasets: 4.4.1
+- Tokenizers: 0.22.1
 ## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+    doi = {10.48550/ARXIV.2209.11055},
+    url = {https://arxiv.org/abs/2209.11055},
+    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+    title = {Efficient Few-Shot Learning Without Prompts},
+    publisher = {arXiv},
+    year = {2022},
+    copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config_setfit.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-  "normalize_embeddings": false,
   "labels": [
     "Trigger",
     "Casualty",
     "Weapon"
-  ]
 }

 {
   "labels": [
     "Trigger",
     "Casualty",
     "Weapon"
+  ],
+  "normalize_embeddings": false
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:95c79c45ab3f36cf71006e48a80c98a9711d197508e41f5ee17dfb35fe8c5757
 size 133462128

 version https://git-lfs.github.com/spec/v1
+oid sha256:810e83a30f42979ba8a25d2e797843dd802456fc79565ebc7fec264d993a23b7
 size 133462128

model_head.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:732b2b4465f09f482ba088de92615478585bd94873cbdeb65e2cc00e65cef30f
 size 11236

 version https://git-lfs.github.com/spec/v1
+oid sha256:bbf961500280819b966a72f6006acf90bb4de6ba9ea63df1282ebacab1309ae0
 size 11236