Spaces:
Running
Running
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,269 +1,14 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
Identifies persons, organizations, locations, weapons, and other entity types. Results are color-coded. Supports single text and CSV batch processing.
|
| 17 |
-
|
| 18 |
-
<!-- Take a screenshot of the NER tab with sample output and save as screenshots/ner.png -->
|
| 19 |
-

|
| 20 |
-
|
| 21 |
-
### Binary Classification
|
| 22 |
-
|
| 23 |
-
Classifies text as conflict-related or not. Uses the pretrained ConfliBERT classifier by default, or load your own fine-tuned model.
|
| 24 |
-
|
| 25 |
-
<!-- Take a screenshot of the Classification tab and save as screenshots/classification.png -->
|
| 26 |
-

|
| 27 |
-
|
| 28 |
-
### Multilabel Classification
|
| 29 |
-
|
| 30 |
-
Scores text against four event categories (Armed Assault, Bombing/Explosion, Kidnapping, Other). Each category is scored independently.
|
| 31 |
-
|
| 32 |
-
<!-- Take a screenshot of the Multilabel tab and save as screenshots/multilabel.png -->
|
| 33 |
-

|
| 34 |
-
|
| 35 |
-
### Question Answering
|
| 36 |
-
|
| 37 |
-
Provide a context passage and a question. The model extracts the most relevant answer span.
|
| 38 |
-
|
| 39 |
-
<!-- Take a screenshot of the QA tab and save as screenshots/qa.png -->
|
| 40 |
-

|
| 41 |
-
|
| 42 |
-
### Fine-tuning
|
| 43 |
-
|
| 44 |
-
Train your own binary or multiclass classifier directly in the browser. Upload data (or load a built-in example), pick a base model, configure training, and go. Supports **LoRA** and **QLoRA** for parameter-efficient training with lower VRAM usage. After training, results and a "Try Your Model" panel appear side by side. You can also save the model and run batch predictions.
|
| 45 |
-
|
| 46 |
-
### Model Comparison
|
| 47 |
-
|
| 48 |
-
Compare multiple base model architectures on the same dataset. The comparison produces a metrics table, a grouped bar chart, and ROC-AUC curves.
|
| 49 |
-
|
| 50 |
-
<!-- Take a screenshot of the Fine-tune tab and save as screenshots/finetune.png -->
|
| 51 |
-

|
| 52 |
-
|
| 53 |
-
### Active Learning
|
| 54 |
-
|
| 55 |
-
Iteratively build a strong classifier with fewer labels. Start with a small labeled seed set and a pool of unlabeled text. The model identifies the most uncertain samples for you to label, retrains, and repeats. Supports entropy, margin, and least-confidence query strategies.
|
| 56 |
-
|
| 57 |
-
## Supported Models
|
| 58 |
-
|
| 59 |
-
### Pretrained (Inference)
|
| 60 |
-
|
| 61 |
-
| Task | HuggingFace Model |
|
| 62 |
-
|------|-------------------|
|
| 63 |
-
| NER | `eventdata-utd/conflibert-named-entity-recognition` |
|
| 64 |
-
| Binary Classification | `eventdata-utd/conflibert-binary-classification` |
|
| 65 |
-
| Multilabel Classification | `eventdata-utd/conflibert-satp-relevant-multilabel` |
|
| 66 |
-
| Question Answering | `salsarra/ConfliBERT-QA` |
|
| 67 |
-
|
| 68 |
-
### Fine-tuning (Base Models)
|
| 69 |
-
|
| 70 |
-
| Model | HuggingFace ID | Notes |
|
| 71 |
-
|-------|----------------|-------|
|
| 72 |
-
| ConfliBERT | `snowood1/ConfliBERT-scr-uncased` | Best for conflict/political text |
|
| 73 |
-
| BERT Base Uncased | `bert-base-uncased` | General-purpose baseline |
|
| 74 |
-
| BERT Base Cased | `bert-base-cased` | Case-sensitive variant |
|
| 75 |
-
| RoBERTa Base | `roberta-base` | Improved BERT training |
|
| 76 |
-
| ModernBERT Base | `answerdotai/ModernBERT-base` | Up to 8K token context |
|
| 77 |
-
| DeBERTa v3 Base | `microsoft/deberta-v3-base` | Strong on benchmarks |
|
| 78 |
-
| DistilBERT Base | `distilbert-base-uncased` | Faster, smaller |
|
| 79 |
-
|
| 80 |
-
## Installation
|
| 81 |
-
|
| 82 |
-
### Requirements
|
| 83 |
-
|
| 84 |
-
- Python 3.8+
|
| 85 |
-
- Git
|
| 86 |
-
|
| 87 |
-
### Steps
|
| 88 |
-
|
| 89 |
-
1. Clone the repository:
|
| 90 |
-
|
| 91 |
-
```bash
|
| 92 |
-
git clone https://github.com/shreyasmeher/conflibert-gui.git
|
| 93 |
-
cd conflibert-gui
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
2. Create and activate a virtual environment:
|
| 97 |
-
|
| 98 |
-
```bash
|
| 99 |
-
python -m venv env
|
| 100 |
-
|
| 101 |
-
# Mac/Linux:
|
| 102 |
-
source env/bin/activate
|
| 103 |
-
|
| 104 |
-
# Windows:
|
| 105 |
-
env\Scripts\activate
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
On Windows, if you get a permission error, run PowerShell as Administrator and execute:
|
| 109 |
-
|
| 110 |
-
```powershell
|
| 111 |
-
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine
|
| 112 |
-
```
|
| 113 |
-
|
| 114 |
-
3. Install PyTorch:
|
| 115 |
-
|
| 116 |
-
```bash
|
| 117 |
-
# CPU only (Mac, or no NVIDIA GPU):
|
| 118 |
-
pip install torch
|
| 119 |
-
|
| 120 |
-
# NVIDIA GPU (Windows/Linux):
|
| 121 |
-
pip install torch --index-url https://download.pytorch.org/whl/cu124
|
| 122 |
-
```
|
| 123 |
-
|
| 124 |
-
4. Install remaining dependencies:
|
| 125 |
-
|
| 126 |
-
```bash
|
| 127 |
-
pip install -r requirements.txt
|
| 128 |
-
```
|
| 129 |
-
|
| 130 |
-
## Usage
|
| 131 |
-
|
| 132 |
-
Start the application:
|
| 133 |
-
|
| 134 |
-
```bash
|
| 135 |
-
python app.py
|
| 136 |
-
```
|
| 137 |
-
|
| 138 |
-
Opens at `http://localhost:7860` and generates a public shareable link. The first launch takes a minute or two while it downloads the pretrained models.
|
| 139 |
-
|
| 140 |
-
### Tabs
|
| 141 |
-
|
| 142 |
-
| Tab | What it does |
|
| 143 |
-
|-----|-------------|
|
| 144 |
-
| Home | System info, feature overview, citation |
|
| 145 |
-
| Named Entity Recognition | Identify entities in text or CSV |
|
| 146 |
-
| Binary Classification | Conflict vs. non-conflict, supports custom models |
|
| 147 |
-
| Multilabel Classification | Multi-event-type scoring |
|
| 148 |
-
| Question Answering | Extract answers from a context passage |
|
| 149 |
-
| Fine-tune | Train classifiers with optional LoRA/QLoRA, compare models, ROC curves |
|
| 150 |
-
| Active Learning | Iterative uncertainty-based labeling and retraining |
|
| 151 |
-
|
| 152 |
-
### Fine-tuning Quick Start
|
| 153 |
-
|
| 154 |
-
1. Go to the **Fine-tune** tab
|
| 155 |
-
2. Click **"Load Example: Binary"** to load sample data
|
| 156 |
-
3. Leave defaults and click **"Start Training"**
|
| 157 |
-
4. Review metrics and try your model on new text
|
| 158 |
-
5. Save the model and load it in the **Binary Classification** tab
|
| 159 |
-
|
| 160 |
-
### LoRA / QLoRA Fine-tuning
|
| 161 |
-
|
| 162 |
-
1. Go to the **Fine-tune** tab
|
| 163 |
-
2. Open **Advanced Settings** and check **Use LoRA** (optionally enable **QLoRA** for 4-bit quantization on CUDA GPUs)
|
| 164 |
-
3. Adjust LoRA rank and alpha as needed (defaults of r=8, alpha=16 work well)
|
| 165 |
-
4. Train as usual — LoRA weights are merged back automatically so the saved model works like any other
|
| 166 |
-
|
| 167 |
-
### Model Comparison Quick Start
|
| 168 |
-
|
| 169 |
-
1. Upload data (or load an example) in the **Fine-tune** tab
|
| 170 |
-
2. Scroll down and open **"Compare Multiple Models"**
|
| 171 |
-
3. Check 2 or more models to compare
|
| 172 |
-
4. Click **"Compare Models"**
|
| 173 |
-
5. View the metrics table, bar chart, and ROC-AUC curves
|
| 174 |
-
|
| 175 |
-
### Active Learning Quick Start
|
| 176 |
-
|
| 177 |
-
1. Go to the **Active Learning** tab
|
| 178 |
-
2. Click **"Load Example: Binary Active Learning"** (or upload your own seed + pool)
|
| 179 |
-
3. Configure the query strategy and samples per round
|
| 180 |
-
4. Click **"Initialize Active Learning"**
|
| 181 |
-
5. Label the uncertain samples shown in the table (fill in 0 or 1)
|
| 182 |
-
6. Click **"Submit Labels & Next Round"** to retrain and get the next batch
|
| 183 |
-
7. Repeat until satisfied, then save the model
|
| 184 |
-
|
| 185 |
-
### Data Format
|
| 186 |
-
|
| 187 |
-
Tab-separated values (TSV), no header row. Each line: `text<TAB>label`
|
| 188 |
-
|
| 189 |
-
Binary example:
|
| 190 |
-
```
|
| 191 |
-
The bomb exploded near the market 1
|
| 192 |
-
It was a sunny day at the park 0
|
| 193 |
-
```
|
| 194 |
-
|
| 195 |
-
Multiclass example (integer labels starting from 0):
|
| 196 |
-
```
|
| 197 |
-
The president signed the peace treaty 0
|
| 198 |
-
Militants attacked the military base 1
|
| 199 |
-
Thousands marched in the capital 2
|
| 200 |
-
Aid workers delivered food supplies 3
|
| 201 |
-
```
|
| 202 |
-
|
| 203 |
-
### CSV Batch Processing
|
| 204 |
-
|
| 205 |
-
Prepare a CSV with a `text` column:
|
| 206 |
-
|
| 207 |
-
```csv
|
| 208 |
-
text
|
| 209 |
-
"The soldiers advanced toward the border."
|
| 210 |
-
"The festival attracted thousands of visitors."
|
| 211 |
-
```
|
| 212 |
-
|
| 213 |
-
Upload it in the Batch Processing section of any inference tab.
|
| 214 |
-
|
| 215 |
-
## Project Structure
|
| 216 |
-
|
| 217 |
-
```
|
| 218 |
-
conflibert-gui/
|
| 219 |
-
app.py # Main application
|
| 220 |
-
requirements.txt # Dependencies
|
| 221 |
-
README.md
|
| 222 |
-
screenshots/ # UI screenshots for documentation
|
| 223 |
-
examples/
|
| 224 |
-
binary/ # Example binary dataset (conflict vs non-conflict)
|
| 225 |
-
train.tsv
|
| 226 |
-
dev.tsv
|
| 227 |
-
test.tsv
|
| 228 |
-
multiclass/ # Example multiclass dataset (4 event types)
|
| 229 |
-
train.tsv # 0=Diplomacy, 1=Armed Conflict,
|
| 230 |
-
dev.tsv # 2=Protest, 3=Humanitarian
|
| 231 |
-
test.tsv
|
| 232 |
-
active_learning/ # Example active learning dataset
|
| 233 |
-
seed.tsv # 20 labeled seed samples
|
| 234 |
-
pool.txt # 61 unlabeled pool texts
|
| 235 |
-
pool_with_labels.tsv # Ground truth for pool (cheat sheet)
|
| 236 |
-
```
|
| 237 |
-
|
| 238 |
-
## Training Features
|
| 239 |
-
|
| 240 |
-
- **LoRA / QLoRA** parameter-efficient fine-tuning (via [PEFT](https://github.com/huggingface/peft))
|
| 241 |
-
- **Active learning** with entropy, margin, and least-confidence query strategies
|
| 242 |
-
- Early stopping with configurable patience
|
| 243 |
-
- Learning rate schedulers: linear, cosine, constant, constant with warmup
|
| 244 |
-
- Mixed precision training (FP16) on CUDA GPUs
|
| 245 |
-
- Gradient accumulation for larger effective batch sizes
|
| 246 |
-
- Weight decay regularization
|
| 247 |
-
- Automatic system detection (NVIDIA GPU, Apple Silicon MPS, CPU)
|
| 248 |
-
- Model comparison with grouped bar charts and ROC-AUC curves
|
| 249 |
-
|
| 250 |
-
## Citation
|
| 251 |
-
|
| 252 |
-
If you use ConfliBERT in your research, please cite:
|
| 253 |
-
|
| 254 |
-
Brandt, P.T., Alsarra, S., D'Orazio, V., Heintze, D., Khan, L., Meher, S., Osorio, J. and Sianan, M., 2025. Extractive versus Generative Language Models for Political Conflict Text Classification. *Political Analysis*, pp.1-29.
|
| 255 |
-
|
| 256 |
-
```bibtex
|
| 257 |
-
@article{brandt2025extractive,
|
| 258 |
-
title={Extractive versus Generative Language Models for Political Conflict Text Classification},
|
| 259 |
-
author={Brandt, Patrick T and Alsarra, Sultan and D'Orazio, Vito and Heintze, Dagmar and Khan, Latifur and Meher, Shreyas and Osorio, Javier and Sianan, Marcus},
|
| 260 |
-
journal={Political Analysis},
|
| 261 |
-
pages={1--29},
|
| 262 |
-
year={2025},
|
| 263 |
-
publisher={Cambridge University Press}
|
| 264 |
-
}
|
| 265 |
-
```
|
| 266 |
-
|
| 267 |
-
## License
|
| 268 |
-
|
| 269 |
-
MIT License. See LICENSE for details.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: ConfliBERT GUI v3
|
| 3 |
+
emoji: '🔥'
|
| 4 |
+
colorFrom: red
|
| 5 |
+
colorTo: yellow
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "5.20.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# ConfliBERT GUI v3
|
| 13 |
+
|
| 14 |
+
A browser-based NLP toolkit for conflict and political violence text analysis. Fine-tune classifiers with LoRA/QLoRA support and active learning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|