| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - pytorch |
| - text-classification |
| - dei |
| - bias-detection |
| - equibert |
| datasets: |
| - synthetic |
| metrics: |
| - f1 |
| - roc_auc |
| --- |
| |
| # EquiBERT β Bias Classifier |
|
|
| **Model ID:** `SallySims/equibert-bias-classifier` |
|
|
| Multi-label classifier that detects seven types of bias in |
| organisational text β job descriptions, HR communications, |
| policies, and workplace language. |
|
|
| ## Labels |
|
|
| | ID | Label | Description | |
| |----|-------|-------------| |
| | 0 | `gender_bias` | Gendered language, role assumptions, masculine-coded words | |
| | 1 | `racial_bias` | Racial coding, cultural fit language, tokenism | |
| | 2 | `age_bias` | Digital native language, overqualified framing, generational stereotypes | |
| | 3 | `ability_bias` | Ableist language, physical requirements, disability framing | |
| | 4 | `socioeconomic_bias` | Class-coded language, credential gatekeeping | |
| | 5 | `cultural_bias` | Cultural exclusion, religious insensitivity | |
| | 6 | `intersectional` | Compounding bias across multiple identity dimensions | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer |
| import torch |
| |
| tokenizer = AutoTokenizer.from_pretrained("SallySims/equibert-bias-classifier") |
| |
| text = "We need a rock star developer who can dominate the roadmap." |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
| |
| # Load model weights and run inference |
| # (use with the EquiBERT modeling code from the repository) |
| ``` |
|
|
| ## Task Head Architecture |
|
|
| ``` |
| CLS token β Dropout(0.1) β Linear(hidden, hidden//2) β GELU β Linear(hidden//2, 7) |
| β |
| BCEWithLogitsLoss (multi-label) |
| Sigmoid threshold @ 0.5 |
| ``` |
|
|
| ## Performance (synthetic data, seed=42) |
|
|
| | Metric | Score | |
| |--------|-------| |
| | Macro F1 | 0.72 | |
| | Micro F1 | 0.76 | |
| | AUC | 0.81 | |
|
|
| ## Model Description |
|
|
| EquiBERT is a multi-task DEI (Diversity, Equity and Inclusion) transformer |
| built on a dual-encoder backbone that fuses **RoBERTa-base** and |
| **DeBERTa-v3-base** via a learned weighted sum (Ξ± parameter). |
| The fused representation is fed into task-specific heads covering |
| 17 distinct DEI analysis tasks. |
|
|
| **Organisation:** [SallySims](https://huggingface.co/SallySims) |
| **Framework:** PyTorch + HuggingFace Transformers |
| **Backbone:** RoBERTa-base + DeBERTa-v3-base (dual encoder, fused) |
| **Language:** English |
| **Domain:** Organisational DEI text β HR communications, policies, |
| job descriptions, performance reviews, leadership statements, reports |
|
|
| ## Architecture |
|
|
| ``` |
| Input Text |
| β |
| ββββΆ RoBERTa-base encoder βββΆ Linear projection |
| β β |
| ββββΆ DeBERTa-v3-base encoder βββΆ Linear projection |
| β |
| Weighted fusion (learned Ξ±) |
| β |
| Layer Norm + Dropout |
| β |
| Task-specific head (see below) |
| ``` |
|
|
| ## Training Data |
|
|
| Trained on synthetic DEI organisational text generated by the |
| EquiBERT synthetic data pipeline, covering 20 DEI categories |
| across HR, policy, leadership, and workforce analytics domains. |
| For production use, fine-tune on real labelled DEI data. |
|
|
| ## Limitations |
|
|
| - Trained on synthetic data β predictions should be validated |
| before use in real HR or policy decisions. |
| - English-only. |
| - Not a substitute for qualified DEI practitioners or legal advice. |
| - May reflect biases present in the training corpus. |
|
|
| ## Citation |
|
|
| If you use EquiBERT in your research, please cite: |
|
|
| ```bibtex |
| @misc{equibert2024, |
| author = {SallySims}, |
| title = {EquiBERT: A Multi-Task DEI Transformer}, |
| year = {2024}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/SallySims} |
| } |
| ``` |
|
|
| ## All EquiBERT Models |
|
|
| | Model | Task | Primary Metric | |
| |-------|------|---------------| |
| | [equibert-bias-classifier](https://huggingface.co/SallySims/equibert-bias-classifier) | Bias Detection | Macro F1 | |
| | [equibert-microaggression](https://huggingface.co/SallySims/equibert-microaggression) | Microaggression Detection | Macro F1 | |
| | [equibert-category-tagger](https://huggingface.co/SallySims/equibert-category-tagger) | DEI Category Tagging | Macro F1 | |
| | [equibert-event-exclusion](https://huggingface.co/SallySims/equibert-event-exclusion) | Event Exclusion Classification | Macro F1 | |
| | [equibert-inclusive-language](https://huggingface.co/SallySims/equibert-inclusive-language) | Inclusive Language Scoring | Span F1 | |
| | [equibert-review-auditor](https://huggingface.co/SallySims/equibert-review-auditor) | Performance Review Auditing | Span F1 | |
| | [equibert-washing-detector](https://huggingface.co/SallySims/equibert-washing-detector) | DEI Washing Detection | MAE | |
| | [equibert-framing-scorer](https://huggingface.co/SallySims/equibert-framing-scorer) | Report Framing Scoring | MAE | |
| | [equibert-awareness-scorer](https://huggingface.co/SallySims/equibert-awareness-scorer) | DEI Awareness Scoring | MAE | |
| | [equibert-similarity](https://huggingface.co/SallySims/equibert-similarity) | Semantic Similarity | Accuracy | |
| | [equibert-ner](https://huggingface.co/SallySims/equibert-ner) | DEI Entity Recognition | Span F1 | |
| | [equibert-relation-extraction](https://huggingface.co/SallySims/equibert-relation-extraction) | Relation Extraction | Macro F1 | |
| | [equibert-qa](https://huggingface.co/SallySims/equibert-qa) | Extractive QA | Span EM | |
| | [equibert-search](https://huggingface.co/SallySims/equibert-search) | Semantic Search | MRR@10 | |
| | [equibert-nli](https://huggingface.co/SallySims/equibert-nli) | NLI / Textual Entailment | Macro F1 | |
| | [equibert-generator](https://huggingface.co/SallySims/equibert-generator) | DEI Text Generation | ROUGE-L | |
|
|