oliviermills commited on
Commit
bb68079
Β·
verified Β·
1 Parent(s): cdef5cf

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +232 -254
README.md CHANGED
@@ -1,290 +1,268 @@
1
  ---
2
- language: en
3
  license: cc-by-nc-4.0
 
4
  tags:
5
  - setfit
6
  - sentence-transformers
7
  - text-classification
8
- - generated_from_setfit_trainer
9
- widget:
10
- - text: Israeli forces destroy water pump in Nablus, West Bank, cutting water supply
11
- to over 20,000 Palestinians in multiple villages
12
- - text: Chinese man killed for speaking out against displacement of communities by
13
- the Three Gorges Dam
14
- - text: Protests over water cuts turn violent in Tunisia
15
- - text: National leader Dilma Ferreira Silva, working for policy reform to support
16
- people affected by dams, is murdered in Brazil
17
- - text: Water reservoir sustains minor damages from bombing in Colombia
18
  metrics:
 
19
  - accuracy
20
- pipeline_tag: text-classification
21
- library_name: setfit
22
- inference: false
23
- base_model: BAAI/bge-small-en-v1.5
 
 
 
 
24
  ---
25
 
26
- # SetFit with BAAI/bge-small-en-v1.5
27
 
28
- This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) as the Sentence Transformer embedding model. A OneVsRestClassifier instance is used for classification.
29
 
30
- The model has been trained using an efficient few-shot learning technique that involves:
31
 
32
- 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
33
- 2. Training a classification head with features from the fine-tuned Sentence Transformer.
34
 
35
- ## Model Details
36
 
37
- ### Model Description
38
- - **Model Type:** SetFit
39
- - **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
40
- - **Classification head:** a OneVsRestClassifier instance
41
- - **Maximum Sequence Length:** 512 tokens
42
- - **Number of Classes:** 3 classes
43
- <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
44
- - **Language:** en
45
- - **License:** cc-by-nc-4.0
46
 
47
- ### Model Sources
48
 
49
- - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
50
- - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
51
- - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
52
 
53
- ## Uses
54
 
55
- ### Direct Use for Inference
56
 
57
- First install the SetFit library:
 
 
 
 
 
 
 
58
 
59
- ```bash
60
- pip install setfit
61
- ```
62
 
63
- Then you can load this model and run inference.
64
 
65
  ```python
66
  from setfit import SetFitModel
67
 
68
- # Download from the πŸ€— Hub
69
  model = SetFitModel.from_pretrained("baobabtech/water-conflict-classifier")
70
- # Run inference
71
- preds = model("Protests over water cuts turn violent in Tunisia")
 
 
 
 
 
 
 
 
 
72
  ```
73
 
74
- <!--
75
- ### Downstream Use
76
-
77
- *List how someone could finetune this model on their own dataset.*
78
- -->
79
-
80
- <!--
81
- ### Out-of-Scope Use
82
-
83
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
84
- -->
85
-
86
- <!--
87
- ## Bias, Risks and Limitations
88
-
89
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
90
- -->
91
-
92
- <!--
93
- ### Recommendations
94
-
95
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
96
- -->
97
-
98
- ## Training Details
99
-
100
- ### Training Set Metrics
101
- | Training set | Min | Median | Max |
102
- |:-------------|:----|:--------|:----|
103
- | Word count | 3 | 16.3692 | 154 |
104
-
105
- ### Training Hyperparameters
106
- - batch_size: (32, 32)
107
- - num_epochs: (4, 4)
108
- - max_steps: -1
109
- - sampling_strategy: oversampling
110
- - num_iterations: 20
111
- - body_learning_rate: (2e-05, 2e-05)
112
- - head_learning_rate: 0.01
113
- - loss: CosineSimilarityLoss
114
- - distance_metric: cosine_distance
115
- - margin: 0.25
116
- - end_to_end: False
117
- - use_amp: False
118
- - warmup_proportion: 0.1
119
- - l2_weight: 0.01
120
- - seed: 42
121
- - eval_max_steps: -1
122
- - load_best_model_at_end: True
123
-
124
- ### Training Results
125
- | Epoch | Step | Training Loss | Validation Loss |
126
- |:------:|:----:|:-------------:|:---------------:|
127
- | 0.0007 | 1 | 0.2228 | - |
128
- | 0.0333 | 50 | 0.236 | - |
129
- | 0.0667 | 100 | 0.2308 | - |
130
- | 0.1 | 150 | 0.2186 | - |
131
- | 0.1333 | 200 | 0.203 | - |
132
- | 0.1667 | 250 | 0.1836 | - |
133
- | 0.2 | 300 | 0.159 | - |
134
- | 0.2333 | 350 | 0.1373 | - |
135
- | 0.2667 | 400 | 0.1265 | - |
136
- | 0.3 | 450 | 0.111 | - |
137
- | 0.3333 | 500 | 0.1045 | - |
138
- | 0.3667 | 550 | 0.0906 | - |
139
- | 0.4 | 600 | 0.0848 | - |
140
- | 0.4333 | 650 | 0.0829 | - |
141
- | 0.4667 | 700 | 0.0706 | - |
142
- | 0.5 | 750 | 0.0631 | - |
143
- | 0.5333 | 800 | 0.0625 | - |
144
- | 0.5667 | 850 | 0.0572 | - |
145
- | 0.6 | 900 | 0.0553 | - |
146
- | 0.6333 | 950 | 0.0499 | - |
147
- | 0.6667 | 1000 | 0.0533 | - |
148
- | 0.7 | 1050 | 0.044 | - |
149
- | 0.7333 | 1100 | 0.0486 | - |
150
- | 0.7667 | 1150 | 0.045 | - |
151
- | 0.8 | 1200 | 0.0411 | - |
152
- | 0.8333 | 1250 | 0.0464 | - |
153
- | 0.8667 | 1300 | 0.0414 | - |
154
- | 0.9 | 1350 | 0.0378 | - |
155
- | 0.9333 | 1400 | 0.0379 | - |
156
- | 0.9667 | 1450 | 0.0408 | - |
157
- | 1.0 | 1500 | 0.0356 | 0.1011 |
158
- | 1.0333 | 1550 | 0.0338 | - |
159
- | 1.0667 | 1600 | 0.0304 | - |
160
- | 1.1 | 1650 | 0.0339 | - |
161
- | 1.1333 | 1700 | 0.0319 | - |
162
- | 1.1667 | 1750 | 0.0331 | - |
163
- | 1.2 | 1800 | 0.0307 | - |
164
- | 1.2333 | 1850 | 0.0349 | - |
165
- | 1.2667 | 1900 | 0.0341 | - |
166
- | 1.3 | 1950 | 0.032 | - |
167
- | 1.3333 | 2000 | 0.0353 | - |
168
- | 1.3667 | 2050 | 0.0312 | - |
169
- | 1.4 | 2100 | 0.0313 | - |
170
- | 1.4333 | 2150 | 0.0288 | - |
171
- | 1.4667 | 2200 | 0.0308 | - |
172
- | 1.5 | 2250 | 0.0269 | - |
173
- | 1.5333 | 2300 | 0.0292 | - |
174
- | 1.5667 | 2350 | 0.0299 | - |
175
- | 1.6 | 2400 | 0.0291 | - |
176
- | 1.6333 | 2450 | 0.0286 | - |
177
- | 1.6667 | 2500 | 0.0283 | - |
178
- | 1.7 | 2550 | 0.0299 | - |
179
- | 1.7333 | 2600 | 0.0283 | - |
180
- | 1.7667 | 2650 | 0.027 | - |
181
- | 1.8 | 2700 | 0.0303 | - |
182
- | 1.8333 | 2750 | 0.0293 | - |
183
- | 1.8667 | 2800 | 0.0281 | - |
184
- | 1.9 | 2850 | 0.0288 | - |
185
- | 1.9333 | 2900 | 0.0285 | - |
186
- | 1.9667 | 2950 | 0.0266 | - |
187
- | 2.0 | 3000 | 0.0276 | 0.0950 |
188
- | 2.0333 | 3050 | 0.0283 | - |
189
- | 2.0667 | 3100 | 0.0282 | - |
190
- | 2.1 | 3150 | 0.0275 | - |
191
- | 2.1333 | 3200 | 0.0263 | - |
192
- | 2.1667 | 3250 | 0.025 | - |
193
- | 2.2 | 3300 | 0.0256 | - |
194
- | 2.2333 | 3350 | 0.0259 | - |
195
- | 2.2667 | 3400 | 0.0255 | - |
196
- | 2.3 | 3450 | 0.0253 | - |
197
- | 2.3333 | 3500 | 0.0261 | - |
198
- | 2.3667 | 3550 | 0.0272 | - |
199
- | 2.4 | 3600 | 0.0253 | - |
200
- | 2.4333 | 3650 | 0.0235 | - |
201
- | 2.4667 | 3700 | 0.0264 | - |
202
- | 2.5 | 3750 | 0.0267 | - |
203
- | 2.5333 | 3800 | 0.0248 | - |
204
- | 2.5667 | 3850 | 0.026 | - |
205
- | 2.6 | 3900 | 0.0239 | - |
206
- | 2.6333 | 3950 | 0.0264 | - |
207
- | 2.6667 | 4000 | 0.0243 | - |
208
- | 2.7 | 4050 | 0.0224 | - |
209
- | 2.7333 | 4100 | 0.0244 | - |
210
- | 2.7667 | 4150 | 0.026 | - |
211
- | 2.8 | 4200 | 0.0242 | - |
212
- | 2.8333 | 4250 | 0.0244 | - |
213
- | 2.8667 | 4300 | 0.0238 | - |
214
- | 2.9 | 4350 | 0.0263 | - |
215
- | 2.9333 | 4400 | 0.0249 | - |
216
- | 2.9667 | 4450 | 0.0246 | - |
217
- | 3.0 | 4500 | 0.0273 | 0.0951 |
218
- | 3.0333 | 4550 | 0.0245 | - |
219
- | 3.0667 | 4600 | 0.0255 | - |
220
- | 3.1 | 4650 | 0.0262 | - |
221
- | 3.1333 | 4700 | 0.0236 | - |
222
- | 3.1667 | 4750 | 0.022 | - |
223
- | 3.2 | 4800 | 0.0224 | - |
224
- | 3.2333 | 4850 | 0.0246 | - |
225
- | 3.2667 | 4900 | 0.0231 | - |
226
- | 3.3 | 4950 | 0.0247 | - |
227
- | 3.3333 | 5000 | 0.0251 | - |
228
- | 3.3667 | 5050 | 0.0245 | - |
229
- | 3.4 | 5100 | 0.0248 | - |
230
- | 3.4333 | 5150 | 0.0245 | - |
231
- | 3.4667 | 5200 | 0.0232 | - |
232
- | 3.5 | 5250 | 0.0245 | - |
233
- | 3.5333 | 5300 | 0.022 | - |
234
- | 3.5667 | 5350 | 0.0244 | - |
235
- | 3.6 | 5400 | 0.0258 | - |
236
- | 3.6333 | 5450 | 0.023 | - |
237
- | 3.6667 | 5500 | 0.0232 | - |
238
- | 3.7 | 5550 | 0.0241 | - |
239
- | 3.7333 | 5600 | 0.0229 | - |
240
- | 3.7667 | 5650 | 0.0241 | - |
241
- | 3.8 | 5700 | 0.0229 | - |
242
- | 3.8333 | 5750 | 0.0239 | - |
243
- | 3.8667 | 5800 | 0.023 | - |
244
- | 3.9 | 5850 | 0.0241 | - |
245
- | 3.9333 | 5900 | 0.0232 | - |
246
- | 3.9667 | 5950 | 0.0253 | - |
247
- | 4.0 | 6000 | 0.0241 | 0.0939 |
248
-
249
- ### Framework Versions
250
- - Python: 3.12.12
251
- - SetFit: 1.1.3
252
- - Sentence Transformers: 5.1.2
253
- - Transformers: 4.57.3
254
- - PyTorch: 2.9.1+cu128
255
- - Datasets: 4.4.1
256
- - Tokenizers: 0.22.1
257
-
258
- ## Citation
259
-
260
- ### BibTeX
261
- ```bibtex
262
- @article{https://doi.org/10.48550/arxiv.2209.11055,
263
- doi = {10.48550/ARXIV.2209.11055},
264
- url = {https://arxiv.org/abs/2209.11055},
265
- author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
266
- keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
267
- title = {Efficient Few-Shot Learning Without Prompts},
268
- publisher = {arXiv},
269
- year = {2022},
270
- copyright = {Creative Commons Attribution 4.0 International}
271
- }
272
  ```
273
 
274
- <!--
275
- ## Glossary
 
 
 
 
 
276
 
277
- *Clearly define terms in order to be accessible across audiences.*
278
- -->
 
279
 
280
- <!--
281
- ## Model Card Authors
 
282
 
283
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
284
- -->
285
 
286
- <!--
287
- ## Model Card Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
288
 
289
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
290
- -->
 
1
  ---
 
2
  license: cc-by-nc-4.0
3
+ library_name: setfit
4
  tags:
5
  - setfit
6
  - sentence-transformers
7
  - text-classification
8
+ - multi-label
9
+ - water-conflict
 
 
 
 
 
 
 
 
10
  metrics:
11
+ - f1
12
  - accuracy
13
+ language:
14
+ - en
15
+ widget:
16
+ - text: "Military attack workers at the Kajaki Dam in Afghanistan"
17
+ - text: "Violent protests erupt over dam construction in Sudan"
18
+ - text: "New water treatment plant opens in California"
19
+ - text: "Armed groups cut off water supply to villages in Syria"
20
+ - text: "Government announces new irrigation subsidies"
21
  ---
22
 
23
+ # Water Conflict Multi-Label Classifier
24
 
25
+ ## πŸ”¬ Experimental Research
26
 
27
+ > This experimental research draws on Pacific Institute's [Water Conflict Chronology](https://www.worldwater.org/water-conflict/), which tracks water-related conflicts spanning over 4,500 years of human history. The work is conducted independently and is not affiliated with Pacific Institute.
28
 
29
+ This model is designed to assist researchers in classifying water-related conflict events at scale using tiny/small models that can classify 100s of headlines per second.
 
30
 
31
+ The Pacific Institute maintains the world's most comprehensive open-source record of water-related conflicts, documenting over 2,700 events across 4,500 years of history. This is not a commercial product and is not intended for commercial use.
32
 
33
+ ## πŸ“‹ Model Description
 
 
 
 
 
 
 
 
34
 
35
+ This SetFit-based model classifies news headlines about water-related conflicts into three categories:
36
 
37
+ - **Trigger**: Water resource as a conflict trigger
38
+ - **Casualty**: Water infrastructure as a casualty/target
39
+ - **Weapon**: Water used as a weapon/tool
40
 
41
+ These categories align with the Pacific Institute's Water Conflict Chronology framework for understanding how water intersects with security and conflict.
42
 
43
+ ## πŸ—οΈ Model Details
44
 
45
+ - **Base Model**: [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
46
+ - **Architecture**: SetFit with One-vs-Rest multi-label strategy
47
+ - **Training Approach**: Few-shot learning optimized (SetFit reaches peak performance with small samples)
48
+ - **Training samples**: 1200 examples
49
+ - **Test samples**: 519 (held-out, never seen during training)
50
+ - **Training time**: ~2-5 minutes on A10G GPU
51
+ - **Model size**: 33M Parameters, ~133MB
52
+ - **Inference speed**: ~5-10ms per headline on CPU
53
 
54
+ ## πŸ’» Usage
 
 
55
 
56
+ ### Quick Start
57
 
58
  ```python
59
  from setfit import SetFitModel
60
 
61
+ # Load the trained model from HF Hub
62
  model = SetFitModel.from_pretrained("baobabtech/water-conflict-classifier")
63
+
64
+ # Predict on headlines
65
+ headlines = [
66
+ "Military attack workers at the Kajaki Dam in Afghanistan",
67
+ "New water treatment plant opens in California"
68
+ ]
69
+
70
+ predictions = model.predict(headlines)
71
+ print(predictions)
72
+ # Output: [[1, 1, 0], [0, 0, 0]]
73
+ # Format: [Trigger, Casualty, Weapon]
74
  ```
75
 
76
+ ### Interpreting Results
77
+
78
+ The model returns a list of binary predictions for each label:
79
+
80
+ ```python
81
+ label_names = ['Trigger', 'Casualty', 'Weapon']
82
+
83
+ for headline, pred in zip(headlines, predictions):
84
+ labels = [label_names[i] for i, val in enumerate(pred) if val == 1]
85
+ print(f"Headline: {headline}")
86
+ print(f"Labels: {', '.join(labels) if labels else 'None'}")
87
+ print()
88
+ ```
89
+
90
+ ### Batch Processing
91
+
92
+ ```python
93
+ import pandas as pd
94
+
95
+ # Load your data
96
+ df = pd.read_csv("your_headlines.csv")
97
+
98
+ # Predict in batches
99
+ predictions = model.predict(df['headline'].tolist())
100
+
101
+ # Add predictions to dataframe
102
+ df['trigger'] = [p[0] for p in predictions]
103
+ df['casualty'] = [p[1] for p in predictions]
104
+ df['weapon'] = [p[2] for p in predictions]
105
+ ```
106
+
107
+ ### Example Outputs
108
+
109
+ | Headline | Trigger | Casualty | Weapon |
110
+ |----------|---------|----------|--------|
111
+ | "Armed groups blow up water pipeline in Iraq" | βœ“ | βœ“ | βœ“ |
112
+ | "New water treatment plant opens in California" | βœ— | βœ— | βœ— |
113
+ | "Protests erupt over dam construction in Ethiopia" | βœ“ | βœ— | βœ— |
114
+
115
+ ## πŸ“ˆ Evaluation Results
116
+
117
+ Evaluated on a held-out test set of 519 samples (30% of total data, stratified by label combinations).
118
+
119
+ ### Overall Performance
120
+
121
+ | Metric | Score |
122
+ |--------|-------|
123
+ | Exact Match Accuracy | 0.8227 |
124
+ | Hamming Loss | 0.0796 |
125
+ | F1 (micro) | 0.8700 |
126
+ | F1 (macro) | 0.8221 |
127
+ | F1 (samples) | 0.7090 |
128
+
129
+ ### Per-Label Performance
130
+
131
+ | Label | Precision | Recall | F1 | Support |
132
+ |-------|-----------|--------|-----|---------|
133
+ | Trigger | 0.8750 | 0.8851 | 0.8800 | 174 |
134
+ | Casualty | 0.8902 | 0.9399 | 0.9144 | 233 |
135
+ | Weapon | 0.5753 | 0.8077 | 0.6720 | 52 |
136
+
137
+ ### Training Details
138
+
139
+ - **Training samples**: 1200 examples
140
+ - **Test samples**: 519 examples (held-out before sampling)
141
+ - **Base model**: BAAI/bge-small-en-v1.5 (33M params)
142
+ - **Batch size**: 32
143
+ - **Epochs**: 4
144
+ - **Iterations**: 20 (contrastive pair generation)
145
+ - **Sampling strategy**: oversampling (balances positive/negative pairs)
146
+ - **Training Dataset**: [baobabtech/water-conflict-training-data](https://huggingface.co/datasets/baobabtech/water-conflict-training-data) (version: d2.0)
147
+
148
+
149
+ ### πŸ“ˆ Experiment Tracking
150
+
151
+ All training runs are automatically tracked in a public dataset for experiment comparison:
152
+
153
+ - **Evals Dataset**: [baobabtech/water-conflict-classifier-evals](https://huggingface.co/datasets/baobabtech/water-conflict-classifier-evals)
154
+ - **Tracked Metrics**: F1 scores, accuracy, per-label performance, and all hyperparameters
155
+ - **Compare Experiments**: View how different configurations (sample size, epochs, batch size) affect performance
156
+ - **Reproducibility**: Full training configs logged for each version
157
+
158
+ You can explore past experiments and compare model performance across versions using the evals dataset.
159
+
160
+
161
+ ## πŸ“Š Data Sources
162
+
163
+ ### Positive Examples (Water Conflict Headlines)
164
+ Pacific Institute (2025). *Water Conflict Chronology*. Pacific Institute, Oakland, CA.
165
+ https://www.worldwater.org/water-conflict/
166
+
167
+ ### Negative Examples (Non-Water Conflict Headlines)
168
+ Armed Conflict Location & Event Data Project (ACLED).
169
+ https://acleddata.com/
170
+
171
+ **Note:** Training negatives include synthetic "hard negatives" - peaceful water-related news (e.g., "New desalination plant opens", "Water conservation conference") to prevent false positives on non-conflict water topics.
172
+
173
+ ## 🌍 About This Project
174
+
175
+ This model is part of independent experimental research drawing on the Pacific Institute's Water Conflict Chronology. The Pacific Institute maintains the world's most comprehensive open-source record of water-related conflicts, documenting over 2,700 events across 4,500 years of history.
176
+
177
+ **Project Links:**
178
+ - Pacific Institute Water Conflict Chronology: https://www.worldwater.org/water-conflict/
179
+ - Python Package (PyPI): https://pypi.org/project/water-conflict-classifier/
180
+ - Source Code: https://github.com/baobabtech/waterconflict
181
+ - Model Hub: https://huggingface.co/{model_repo}
182
+
183
+
184
+ ## 🌱 Frugal AI: Training with Limited Data
185
+
186
+ This classifier demonstrates an intentional approach to building AI systems with **limited data** using [SetFit](https://huggingface.co/docs/setfit/en/index) - a framework for few-shot learning with sentence transformers. Rather than defaulting to massive language models (GPT, Claude, or 100B+ parameter models) for simple classification tasks, we fine-tune small, efficient models (e.g., BAAI/bge-small-en-v1.5 with ~33M parameters) on a focused dataset.
187
+
188
+ **Why this matters:** The industry has normalized using trillion-parameter models to classify headlines, answer simple questions, or categorize text - tasks that don't require world knowledge, reasoning, or generative capabilities. This is computationally wasteful and environmentally costly. A properly fine-tuned small model can achieve comparable or better accuracy while using a fraction of the compute resources.
189
+
190
+ **Our approach:**
191
+ - Train on ~600 examples (few-shot learning with SetFit)
192
+ - Deploy small parameter models (e.g., ~33M params) vs. 100B-1T parameter alternatives
193
+ - Achieve specialized task performance without the overhead of general-purpose LLMs
194
+ - Reduce inference costs and latency by orders of magnitude
195
+
196
+ This is not about avoiding large models altogether - they're invaluable for complex reasoning tasks. But for targeted classification problems with labeled data, fine-tuning remains the professional, responsible choice.
197
+
198
+
199
+ ### πŸ‹πŸ½β€β™€οΈ Training Your Own Model
200
+
201
+ You can train your own version using the [published package](https://pypi.org/project/water-conflict-classifier/).
202
+
203
+ **Package includes:**
204
+ - Data preprocessing utilities
205
+ - Training logic (SetFit multi-label)
206
+ - Evaluation metrics
207
+ - Model card generation
208
+
209
+ **Source code:** https://github.com/baobabtech/waterconflict/tree/main/classifier
210
+ **PyPI:** https://pypi.org/project/water-conflict-classifier/
211
+
212
+ ```bash
213
+ # Install package
214
+ pip install water-conflict-classifier
215
+
216
+ # Or install from source for development
217
+ git clone https://github.com/baobabtech/waterconflict.git
218
+ cd waterconflict/classifier
219
+ pip install -e .
220
+
221
+ # Train locally
222
+ python train_setfit_headline_classifier.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
  ```
224
 
225
+ For cloud training on HuggingFace Jobs infrastructure, see the scripts folder in the repository.
226
+
227
+ ## πŸ“œ License
228
+
229
+ Copyright Β© 2025 Baobab Tech
230
+
231
+ This work is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International License](http://creativecommons.org/licenses/by-nc/4.0/).
232
 
233
+ **You are free to:**
234
+ - **Share** β€” copy and redistribute the material in any medium or format
235
+ - **Adapt** β€” remix, transform, and build upon the material
236
 
237
+ **Under the following terms:**
238
+ - **Attribution** β€” You must give appropriate credit to Baobab Tech, provide a link to the license, and indicate if changes were made
239
+ - **NonCommercial** β€” You may not use the material for commercial purposes
240
 
 
 
241
 
242
+ ## πŸ“ Citation
243
+
244
+ If you use this model in your work, please cite:
245
+
246
+ ```bibtex
247
+ @misc{{waterconflict2025,
248
+ title={{Water Conflict Multi-Label Classifier}},
249
+ author={{Independent Experimental Research Drawing on Pacific Institute Water Conflict Chronology}},
250
+ year={{2025}},
251
+ howpublished={{\url{{https://huggingface.co/{model_repo}}}}},
252
+ note={{Training data from Pacific Institute Water Conflict Chronology and ACLED}}
253
+ }}
254
+ ```
255
+
256
+ Please also cite the Pacific Institute's Water Conflict Chronology:
257
+
258
+ ```bibtex
259
+ @misc{{pacificinstitute2025,
260
+ title={{Water Conflict Chronology}},
261
+ author={{Pacific Institute}},
262
+ year={{2025}},
263
+ address={{Oakland, CA}},
264
+ url={{https://www.worldwater.org/water-conflict/}},
265
+ note={{Accessed: [access date]}}
266
+ }}
267
+ ```
268