language: en
license: apache-2.0
tags:
- pytorch
- token-classification
- ner
- dei
- equibert
metrics:
- f1
EquiBERT β DEI Named Entity Recognition
Model ID: SallySims/equibert-ner
Token-level BIO tagger for 13 DEI-specific entity types in organisational text.
Entity Types
| Entity | Description | Example |
|---|---|---|
DEMOGRAPHIC_GROUP |
Identity group reference | "BIPOC employees", "women" |
PROTECTED_CHAR |
Protected characteristic | "gender", "disability" |
DEI_CONCEPT |
DEI domain concept | "unconscious bias", "intersectionality" |
ORG_ROLE |
Organisational role | "CHRO", "hiring manager" |
POLICY_ARTIFACT |
Policy or document | "pay equity review", "DEI report" |
METRIC |
Measurable DEI outcome | "9% pay gap", "27% diverse hiring" |
BIAS_INDICATOR |
Bias signal language | "rock star", "cultural fit" |
COMMITMENT |
DEI commitment statement | "we will close this by Q2" |
BARRIER |
Identified barrier | "no lift at venue" |
ACTION |
Concrete DEI action | "blind CV screening" |
ORGANISATION |
Named organisation | company names |
TIMEFRAME |
Time reference | "Q2 2024", "by year end" |
O |
Outside β no entity | β |
Usage (same as bert-base-NER)
from transformers import pipeline
ner = pipeline("ner", model="SallySims/equibert-ner")
ner("Our CHRO published the annual DEI report with measurable targets.")
Model Description
EquiBERT is a multi-task DEI (Diversity, Equity and Inclusion) transformer built on a dual-encoder backbone that fuses RoBERTa-base and DeBERTa-v3-base via a learned weighted sum (Ξ± parameter). The fused representation is fed into task-specific heads covering 17 distinct DEI analysis tasks.
Organisation: SallySims Framework: PyTorch + HuggingFace Transformers Backbone: RoBERTa-base + DeBERTa-v3-base (dual encoder, fused) Language: English Domain: Organisational DEI text β HR communications, policies, job descriptions, performance reviews, leadership statements, reports
Architecture
Input Text
β
ββββΆ RoBERTa-base encoder βββΆ Linear projection
β β
ββββΆ DeBERTa-v3-base encoder βββΆ Linear projection
β
Weighted fusion (learned Ξ±)
β
Layer Norm + Dropout
β
Task-specific head (see below)
Training Data
Trained on synthetic DEI organisational text generated by the EquiBERT synthetic data pipeline, covering 20 DEI categories across HR, policy, leadership, and workforce analytics domains. For production use, fine-tune on real labelled DEI data.
Limitations
- Trained on synthetic data β predictions should be validated before use in real HR or policy decisions.
- English-only.
- Not a substitute for qualified DEI practitioners or legal advice.
- May reflect biases present in the training corpus.
Citation
If you use EquiBERT in your research, please cite:
@misc{equibert2024,
author = {SallySims},
title = {EquiBERT: A Multi-Task DEI Transformer},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/SallySims}
}
All EquiBERT Models
| Model | Task | Primary Metric |
|---|---|---|
| equibert-bias-classifier | Bias Detection | Macro F1 |
| equibert-microaggression | Microaggression Detection | Macro F1 |
| equibert-category-tagger | DEI Category Tagging | Macro F1 |
| equibert-event-exclusion | Event Exclusion Classification | Macro F1 |
| equibert-inclusive-language | Inclusive Language Scoring | Span F1 |
| equibert-review-auditor | Performance Review Auditing | Span F1 |
| equibert-washing-detector | DEI Washing Detection | MAE |
| equibert-framing-scorer | Report Framing Scoring | MAE |
| equibert-awareness-scorer | DEI Awareness Scoring | MAE |
| equibert-similarity | Semantic Similarity | Accuracy |
| equibert-ner | DEI Entity Recognition | Span F1 |
| equibert-relation-extraction | Relation Extraction | Macro F1 |
| equibert-qa | Extractive QA | Span EM |
| equibert-search | Semantic Search | MRR@10 |
| equibert-nli | NLI / Textual Entailment | Macro F1 |
| equibert-generator | DEI Text Generation | ROUGE-L |