shreyasmeher commited on
Commit
3b9241c
·
verified ·
1 Parent(s): 396346e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +14 -269
README.md CHANGED
@@ -1,269 +1,14 @@
1
- # ConfliBERT GUI
2
-
3
- [ConfliBERT](https://github.com/eventdata/ConfliBERT) is a pretrained language model built specifically for analyzing conflict and political violence text. This application provides a browser-based interface for running inference with ConfliBERT's pretrained models, fine-tuning custom classifiers on your own data, and comparing model performance across architectures.
4
-
5
- ## Screenshots
6
-
7
- ### Home
8
-
9
- The landing page shows your system configuration (GPU/CPU, RAM, platform) and an overview of everything the app can do.
10
-
11
- <!-- Take a screenshot of the Home tab and save as screenshots/home.png -->
12
- ![Home](./screenshots/home.png)
13
-
14
- ### Named Entity Recognition
15
-
16
- Identifies persons, organizations, locations, weapons, and other entity types. Results are color-coded. Supports single text and CSV batch processing.
17
-
18
- <!-- Take a screenshot of the NER tab with sample output and save as screenshots/ner.png -->
19
- ![NER](./screenshots/ner.png)
20
-
21
- ### Binary Classification
22
-
23
- Classifies text as conflict-related or not. Uses the pretrained ConfliBERT classifier by default, or load your own fine-tuned model.
24
-
25
- <!-- Take a screenshot of the Classification tab and save as screenshots/classification.png -->
26
- ![Classification](./screenshots/classification.png)
27
-
28
- ### Multilabel Classification
29
-
30
- Scores text against four event categories (Armed Assault, Bombing/Explosion, Kidnapping, Other). Each category is scored independently.
31
-
32
- <!-- Take a screenshot of the Multilabel tab and save as screenshots/multilabel.png -->
33
- ![Multilabel](./screenshots/multilabel.png)
34
-
35
- ### Question Answering
36
-
37
- Provide a context passage and a question. The model extracts the most relevant answer span.
38
-
39
- <!-- Take a screenshot of the QA tab and save as screenshots/qa.png -->
40
- ![QA](./screenshots/qa.png)
41
-
42
- ### Fine-tuning
43
-
44
- Train your own binary or multiclass classifier directly in the browser. Upload data (or load a built-in example), pick a base model, configure training, and go. Supports **LoRA** and **QLoRA** for parameter-efficient training with lower VRAM usage. After training, results and a "Try Your Model" panel appear side by side. You can also save the model and run batch predictions.
45
-
46
- ### Model Comparison
47
-
48
- Compare multiple base model architectures on the same dataset. The comparison produces a metrics table, a grouped bar chart, and ROC-AUC curves.
49
-
50
- <!-- Take a screenshot of the Fine-tune tab and save as screenshots/finetune.png -->
51
- ![Fine-tune](./screenshots/finetune.png)
52
-
53
- ### Active Learning
54
-
55
- Iteratively build a strong classifier with fewer labels. Start with a small labeled seed set and a pool of unlabeled text. The model identifies the most uncertain samples for you to label, retrains, and repeats. Supports entropy, margin, and least-confidence query strategies.
56
-
57
- ## Supported Models
58
-
59
- ### Pretrained (Inference)
60
-
61
- | Task | HuggingFace Model |
62
- |------|-------------------|
63
- | NER | `eventdata-utd/conflibert-named-entity-recognition` |
64
- | Binary Classification | `eventdata-utd/conflibert-binary-classification` |
65
- | Multilabel Classification | `eventdata-utd/conflibert-satp-relevant-multilabel` |
66
- | Question Answering | `salsarra/ConfliBERT-QA` |
67
-
68
- ### Fine-tuning (Base Models)
69
-
70
- | Model | HuggingFace ID | Notes |
71
- |-------|----------------|-------|
72
- | ConfliBERT | `snowood1/ConfliBERT-scr-uncased` | Best for conflict/political text |
73
- | BERT Base Uncased | `bert-base-uncased` | General-purpose baseline |
74
- | BERT Base Cased | `bert-base-cased` | Case-sensitive variant |
75
- | RoBERTa Base | `roberta-base` | Improved BERT training |
76
- | ModernBERT Base | `answerdotai/ModernBERT-base` | Up to 8K token context |
77
- | DeBERTa v3 Base | `microsoft/deberta-v3-base` | Strong on benchmarks |
78
- | DistilBERT Base | `distilbert-base-uncased` | Faster, smaller |
79
-
80
- ## Installation
81
-
82
- ### Requirements
83
-
84
- - Python 3.8+
85
- - Git
86
-
87
- ### Steps
88
-
89
- 1. Clone the repository:
90
-
91
- ```bash
92
- git clone https://github.com/shreyasmeher/conflibert-gui.git
93
- cd conflibert-gui
94
- ```
95
-
96
- 2. Create and activate a virtual environment:
97
-
98
- ```bash
99
- python -m venv env
100
-
101
- # Mac/Linux:
102
- source env/bin/activate
103
-
104
- # Windows:
105
- env\Scripts\activate
106
- ```
107
-
108
- On Windows, if you get a permission error, run PowerShell as Administrator and execute:
109
-
110
- ```powershell
111
- Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine
112
- ```
113
-
114
- 3. Install PyTorch:
115
-
116
- ```bash
117
- # CPU only (Mac, or no NVIDIA GPU):
118
- pip install torch
119
-
120
- # NVIDIA GPU (Windows/Linux):
121
- pip install torch --index-url https://download.pytorch.org/whl/cu124
122
- ```
123
-
124
- 4. Install remaining dependencies:
125
-
126
- ```bash
127
- pip install -r requirements.txt
128
- ```
129
-
130
- ## Usage
131
-
132
- Start the application:
133
-
134
- ```bash
135
- python app.py
136
- ```
137
-
138
- Opens at `http://localhost:7860` and generates a public shareable link. The first launch takes a minute or two while it downloads the pretrained models.
139
-
140
- ### Tabs
141
-
142
- | Tab | What it does |
143
- |-----|-------------|
144
- | Home | System info, feature overview, citation |
145
- | Named Entity Recognition | Identify entities in text or CSV |
146
- | Binary Classification | Conflict vs. non-conflict, supports custom models |
147
- | Multilabel Classification | Multi-event-type scoring |
148
- | Question Answering | Extract answers from a context passage |
149
- | Fine-tune | Train classifiers with optional LoRA/QLoRA, compare models, ROC curves |
150
- | Active Learning | Iterative uncertainty-based labeling and retraining |
151
-
152
- ### Fine-tuning Quick Start
153
-
154
- 1. Go to the **Fine-tune** tab
155
- 2. Click **"Load Example: Binary"** to load sample data
156
- 3. Leave defaults and click **"Start Training"**
157
- 4. Review metrics and try your model on new text
158
- 5. Save the model and load it in the **Binary Classification** tab
159
-
160
- ### LoRA / QLoRA Fine-tuning
161
-
162
- 1. Go to the **Fine-tune** tab
163
- 2. Open **Advanced Settings** and check **Use LoRA** (optionally enable **QLoRA** for 4-bit quantization on CUDA GPUs)
164
- 3. Adjust LoRA rank and alpha as needed (defaults of r=8, alpha=16 work well)
165
- 4. Train as usual — LoRA weights are merged back automatically so the saved model works like any other
166
-
167
- ### Model Comparison Quick Start
168
-
169
- 1. Upload data (or load an example) in the **Fine-tune** tab
170
- 2. Scroll down and open **"Compare Multiple Models"**
171
- 3. Check 2 or more models to compare
172
- 4. Click **"Compare Models"**
173
- 5. View the metrics table, bar chart, and ROC-AUC curves
174
-
175
- ### Active Learning Quick Start
176
-
177
- 1. Go to the **Active Learning** tab
178
- 2. Click **"Load Example: Binary Active Learning"** (or upload your own seed + pool)
179
- 3. Configure the query strategy and samples per round
180
- 4. Click **"Initialize Active Learning"**
181
- 5. Label the uncertain samples shown in the table (fill in 0 or 1)
182
- 6. Click **"Submit Labels & Next Round"** to retrain and get the next batch
183
- 7. Repeat until satisfied, then save the model
184
-
185
- ### Data Format
186
-
187
- Tab-separated values (TSV), no header row. Each line: `text<TAB>label`
188
-
189
- Binary example:
190
- ```
191
- The bomb exploded near the market 1
192
- It was a sunny day at the park 0
193
- ```
194
-
195
- Multiclass example (integer labels starting from 0):
196
- ```
197
- The president signed the peace treaty 0
198
- Militants attacked the military base 1
199
- Thousands marched in the capital 2
200
- Aid workers delivered food supplies 3
201
- ```
202
-
203
- ### CSV Batch Processing
204
-
205
- Prepare a CSV with a `text` column:
206
-
207
- ```csv
208
- text
209
- "The soldiers advanced toward the border."
210
- "The festival attracted thousands of visitors."
211
- ```
212
-
213
- Upload it in the Batch Processing section of any inference tab.
214
-
215
- ## Project Structure
216
-
217
- ```
218
- conflibert-gui/
219
- app.py # Main application
220
- requirements.txt # Dependencies
221
- README.md
222
- screenshots/ # UI screenshots for documentation
223
- examples/
224
- binary/ # Example binary dataset (conflict vs non-conflict)
225
- train.tsv
226
- dev.tsv
227
- test.tsv
228
- multiclass/ # Example multiclass dataset (4 event types)
229
- train.tsv # 0=Diplomacy, 1=Armed Conflict,
230
- dev.tsv # 2=Protest, 3=Humanitarian
231
- test.tsv
232
- active_learning/ # Example active learning dataset
233
- seed.tsv # 20 labeled seed samples
234
- pool.txt # 61 unlabeled pool texts
235
- pool_with_labels.tsv # Ground truth for pool (cheat sheet)
236
- ```
237
-
238
- ## Training Features
239
-
240
- - **LoRA / QLoRA** parameter-efficient fine-tuning (via [PEFT](https://github.com/huggingface/peft))
241
- - **Active learning** with entropy, margin, and least-confidence query strategies
242
- - Early stopping with configurable patience
243
- - Learning rate schedulers: linear, cosine, constant, constant with warmup
244
- - Mixed precision training (FP16) on CUDA GPUs
245
- - Gradient accumulation for larger effective batch sizes
246
- - Weight decay regularization
247
- - Automatic system detection (NVIDIA GPU, Apple Silicon MPS, CPU)
248
- - Model comparison with grouped bar charts and ROC-AUC curves
249
-
250
- ## Citation
251
-
252
- If you use ConfliBERT in your research, please cite:
253
-
254
- Brandt, P.T., Alsarra, S., D'Orazio, V., Heintze, D., Khan, L., Meher, S., Osorio, J. and Sianan, M., 2025. Extractive versus Generative Language Models for Political Conflict Text Classification. *Political Analysis*, pp.1-29.
255
-
256
- ```bibtex
257
- @article{brandt2025extractive,
258
- title={Extractive versus Generative Language Models for Political Conflict Text Classification},
259
- author={Brandt, Patrick T and Alsarra, Sultan and D'Orazio, Vito and Heintze, Dagmar and Khan, Latifur and Meher, Shreyas and Osorio, Javier and Sianan, Marcus},
260
- journal={Political Analysis},
261
- pages={1--29},
262
- year={2025},
263
- publisher={Cambridge University Press}
264
- }
265
- ```
266
-
267
- ## License
268
-
269
- MIT License. See LICENSE for details.
 
1
+ ---
2
+ title: ConfliBERT GUI v3
3
+ emoji: '🔥'
4
+ colorFrom: red
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: "5.20.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # ConfliBERT GUI v3
13
+
14
+ A browser-based NLP toolkit for conflict and political violence text analysis. Fine-tune classifiers with LoRA/QLoRA support and active learning.