SallySims commited on
Commit
1cef2aa
Β·
verified Β·
1 Parent(s): 9bea35c

Add detailed model card

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - pytorch
6
+ - text-classification
7
+ - dei
8
+ - microaggression
9
+ - equibert
10
+ metrics:
11
+ - f1
12
+ - accuracy
13
+ ---
14
+
15
+ # EquiBERT β€” Microaggression Detector
16
+
17
+ **Model ID:** `SallySims/equibert-microaggression`
18
+
19
+ Single-label classifier that identifies the type of microaggression
20
+ present in workplace communications.
21
+
22
+ ## Labels
23
+
24
+ | ID | Label | Example |
25
+ |----|-------|---------|
26
+ | 0 | `none` | No microaggression detected |
27
+ | 1 | `microinsult` | "You're surprisingly articulate" |
28
+ | 2 | `microinvalidation` | "I don't see colour" |
29
+ | 3 | `microassault` | Deliberate exclusionary behaviour |
30
+ | 4 | `environmental` | Absence of diverse representation |
31
+ | 5 | `behavioural` | Non-verbal exclusion |
32
+ | 6 | `second_generation` | Systemic/institutional microaggression |
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ from transformers import AutoTokenizer
38
+
39
+ tokenizer = AutoTokenizer.from_pretrained("SallySims/equibert-microaggression")
40
+ text = "You are surprisingly articulate for someone from your background."
41
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
42
+ # predicted_class = model(**inputs).logits.argmax(-1)
43
+ ```
44
+
45
+ ## Task Head Architecture
46
+
47
+ ```
48
+ CLS β†’ Dropout(0.1) β†’ Linear(hidden, hidden//2) β†’ GELU β†’ Linear(hidden//2, 7)
49
+ ↓
50
+ CrossEntropyLoss (single-label)
51
+ ```
52
+
53
+ ## Model Description
54
+
55
+ EquiBERT is a multi-task DEI (Diversity, Equity and Inclusion) transformer
56
+ built on a dual-encoder backbone that fuses **RoBERTa-base** and
57
+ **DeBERTa-v3-base** via a learned weighted sum (Ξ± parameter).
58
+ The fused representation is fed into task-specific heads covering
59
+ 17 distinct DEI analysis tasks.
60
+
61
+ **Organisation:** [SallySims](https://huggingface.co/SallySims)
62
+ **Framework:** PyTorch + HuggingFace Transformers
63
+ **Backbone:** RoBERTa-base + DeBERTa-v3-base (dual encoder, fused)
64
+ **Language:** English
65
+ **Domain:** Organisational DEI text β€” HR communications, policies,
66
+ job descriptions, performance reviews, leadership statements, reports
67
+
68
+ ## Architecture
69
+
70
+ ```
71
+ Input Text
72
+ β”‚
73
+ β”œβ”€β”€β–Ά RoBERTa-base encoder ──▢ Linear projection
74
+ β”‚ β”‚
75
+ └──▢ DeBERTa-v3-base encoder ──▢ Linear projection
76
+ β”‚
77
+ Weighted fusion (learned Ξ±)
78
+ β”‚
79
+ Layer Norm + Dropout
80
+ β”‚
81
+ Task-specific head (see below)
82
+ ```
83
+
84
+ ## Training Data
85
+
86
+ Trained on synthetic DEI organisational text generated by the
87
+ EquiBERT synthetic data pipeline, covering 20 DEI categories
88
+ across HR, policy, leadership, and workforce analytics domains.
89
+ For production use, fine-tune on real labelled DEI data.
90
+
91
+ ## Limitations
92
+
93
+ - Trained on synthetic data β€” predictions should be validated
94
+ before use in real HR or policy decisions.
95
+ - English-only.
96
+ - Not a substitute for qualified DEI practitioners or legal advice.
97
+ - May reflect biases present in the training corpus.
98
+
99
+ ## Citation
100
+
101
+ If you use EquiBERT in your research, please cite:
102
+
103
+ ```bibtex
104
+ @misc{equibert2024,
105
+ author = {SallySims},
106
+ title = {EquiBERT: A Multi-Task DEI Transformer},
107
+ year = {2024},
108
+ publisher = {HuggingFace},
109
+ url = {https://huggingface.co/SallySims}
110
+ }
111
+ ```
112
+
113
+ ## All EquiBERT Models
114
+
115
+ | Model | Task | Primary Metric |
116
+ |-------|------|---------------|
117
+ | [equibert-bias-classifier](https://huggingface.co/SallySims/equibert-bias-classifier) | Bias Detection | Macro F1 |
118
+ | [equibert-microaggression](https://huggingface.co/SallySims/equibert-microaggression) | Microaggression Detection | Macro F1 |
119
+ | [equibert-category-tagger](https://huggingface.co/SallySims/equibert-category-tagger) | DEI Category Tagging | Macro F1 |
120
+ | [equibert-event-exclusion](https://huggingface.co/SallySims/equibert-event-exclusion) | Event Exclusion Classification | Macro F1 |
121
+ | [equibert-inclusive-language](https://huggingface.co/SallySims/equibert-inclusive-language) | Inclusive Language Scoring | Span F1 |
122
+ | [equibert-review-auditor](https://huggingface.co/SallySims/equibert-review-auditor) | Performance Review Auditing | Span F1 |
123
+ | [equibert-washing-detector](https://huggingface.co/SallySims/equibert-washing-detector) | DEI Washing Detection | MAE |
124
+ | [equibert-framing-scorer](https://huggingface.co/SallySims/equibert-framing-scorer) | Report Framing Scoring | MAE |
125
+ | [equibert-awareness-scorer](https://huggingface.co/SallySims/equibert-awareness-scorer) | DEI Awareness Scoring | MAE |
126
+ | [equibert-similarity](https://huggingface.co/SallySims/equibert-similarity) | Semantic Similarity | Accuracy |
127
+ | [equibert-ner](https://huggingface.co/SallySims/equibert-ner) | DEI Entity Recognition | Span F1 |
128
+ | [equibert-relation-extraction](https://huggingface.co/SallySims/equibert-relation-extraction) | Relation Extraction | Macro F1 |
129
+ | [equibert-qa](https://huggingface.co/SallySims/equibert-qa) | Extractive QA | Span EM |
130
+ | [equibert-search](https://huggingface.co/SallySims/equibert-search) | Semantic Search | MRR@10 |
131
+ | [equibert-nli](https://huggingface.co/SallySims/equibert-nli) | NLI / Textual Entailment | Macro F1 |
132
+ | [equibert-generator](https://huggingface.co/SallySims/equibert-generator) | DEI Text Generation | ROUGE-L |