wimbert-space / README.md
yhavinga's picture
Initial Gradio Space implementation for WimBERT Synth v0
85efe28
---
title: WimBERT Synth v0
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Dutch multi-label classifier for signal messages
---
# WimBERT Synth v0: Dutch Multi-Label Signal Classifier
Demo of a dual-head BERT classifier trained on synthetic Dutch government signals.
Predicts relevant topics (**onderwerp**, 64 labels) and sentiment/experience
(**beleving**, 33 labels) for each input message.
## 🚀 Usage
1. Enter Dutch text (e.g., a citizen feedback message about government services)
2. Click **Voorspel** to classify
3. Adjust **Drempel** (threshold) to change prediction sensitivity
4. View results in three tabs:
- **Samenvatting**: Top-K predictions per head with color-coded probabilities
- **Alle labels**: Complete list of all labels sorted by probability
- **JSON**: Raw predictions in machine-readable format
## 🎯 Features
- **Dual-head classification**: Simultaneously predicts topic (onderwerp) and experience (beleving)
- **Interactive threshold**: Adjust which labels are considered "predicted"
- **Color-coded visualization**: Probability intensity shown via color (darker = higher probability)
- **Accessible**: All probabilities shown numerically, colors are enhancements
- **Fast**: Optimized for CPU inference (~2-5s) with optional GPU acceleration
## 🤖 Model
- **Base model**: `bert-base-multilingual-cased`
- **Architecture**: Dual classification heads with 64 onderwerp + 33 beleving labels
- **Training**: Synthetic data via Argilla + distillation pipeline
- **License**: Apache-2.0
- **Full model card**: [UWV/wimbert-synth-v0](https://huggingface.co/UWV/wimbert-synth-v0)
### Labels
**Onderwerp (64 topics)**:
Advies, Algemene veiligheid, Begeleiding, Bijstand, Bouwoverlast, COVID-19, Criminaliteit,
Documentaanvraag, Energiekosten, Evenementen, Financiële regelingen, Geluidsoverlast,
Gemeentelijke heffingen, Hangjongeren, Huisdierenoverlast, Hulp aan dak- en thuislozen,
Infrastructuur, Kwijtschelding, Migratie, Onderhoud omgeving, Parkeren, Schade en claims,
Verkeersmaatregelen, Verkeersveiligheid, Wijkteam, and more...
**Beleving (33 experiences)**:
Afspraakmogelijkheden, Algemene ervaring, Behulpzaamheid, Bereikbaarheid, Bezwaar & bewijs,
Communicatie, Deskundigheid, Duidelijkheid, Efficiëntie, Faciliteiten, Gebruiksgemak,
Informatievoorziening, Integriteit, Kwaliteit klantenservice, Snelheid van afhandeling,
Vriendelijkheid, Wachttijd, and more...
## 🔒 Privacy
- Input text is processed **in-memory only**
- No data is logged or stored beyond standard Gradio telemetry
- Model runs entirely within this Space (no external API calls)
## ⚙️ Hardware
- **CPU**: Works on free tier (~3-5s inference)
- **GPU (T4)**: Recommended for production (<1s inference)
Current Space is running on: **CPU** with FP32
## 🛠️ Local Development
```bash
# Clone and setup
git clone https://huggingface.co/spaces/UWV/wimbert-synth-v0
cd wimbert-synth-v0
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run
python app.py
```
## 📊 Example Use Cases
- **Citizen feedback routing**: Automatically categorize incoming messages
- **Sentiment analysis**: Understand citizen experience with government services
- **Analytics**: Aggregate trends across topics and experiences
- **Triage**: Prioritize urgent or negative feedback
⚠️ **Note**: This is a research/demo tool. Not intended for automated decision-making.
---
**Built with**: Gradio • Transformers • PyTorch
**Developed by**: UWV
**License**: Apache-2.0