--- language: en license: apache-2.0 tags: - pytorch - feature-extraction - sentence-similarity - dei - retrieval - equibert metrics: - mrr - recall --- # EquiBERT — DEI Semantic Search **Model ID:** `SallySims/equibert-search` Asymmetric bi-encoder for dense retrieval over DEI document corpora. Produces 768-dimensional embeddings compatible with FAISS and other vector search engines. **Query encoder:** CLS token → lightweight projection (fast at inference) **Document encoder:** Mean-pool → full projection (rich representation) ## Usage ```python from transformers import pipeline embedder = pipeline("feature-extraction", model="SallySims/equibert-search") query_emb = embedder("gender pay gap audit findings") # → shape (1, 768) — use for cosine similarity search ``` ## Search Modes Supported 1. **Semantic search** — pure dense vector similarity 2. **Faceted search** — filter by DEI category, bias type, score range 3. **Relational search** — find documents where X EXCLUDES Y 4. **Multi-hop search** — answer chains across documents ## Recommended Index For production use, index documents with FAISS: ```python import faiss index = faiss.IndexFlatIP(768) # inner product = cosine on normalised vectors index.add(document_embeddings) ``` ## Model Description EquiBERT is a multi-task DEI (Diversity, Equity and Inclusion) transformer built on a dual-encoder backbone that fuses **RoBERTa-base** and **DeBERTa-v3-base** via a learned weighted sum (α parameter). The fused representation is fed into task-specific heads covering 17 distinct DEI analysis tasks. **Organisation:** [SallySims](https://huggingface.co/SallySims) **Framework:** PyTorch + HuggingFace Transformers **Backbone:** RoBERTa-base + DeBERTa-v3-base (dual encoder, fused) **Language:** English **Domain:** Organisational DEI text — HR communications, policies, job descriptions, performance reviews, leadership statements, reports ## Architecture ``` Input Text │ ├──▶ RoBERTa-base encoder ──▶ Linear projection │ │ └──▶ DeBERTa-v3-base encoder ──▶ Linear projection │ Weighted fusion (learned α) │ Layer Norm + Dropout │ Task-specific head (see below) ``` ## Training Data Trained on synthetic DEI organisational text generated by the EquiBERT synthetic data pipeline, covering 20 DEI categories across HR, policy, leadership, and workforce analytics domains. For production use, fine-tune on real labelled DEI data. ## Limitations - Trained on synthetic data — predictions should be validated before use in real HR or policy decisions. - English-only. - Not a substitute for qualified DEI practitioners or legal advice. - May reflect biases present in the training corpus. ## Citation If you use EquiBERT in your research, please cite: ```bibtex @misc{equibert2024, author = {SallySims}, title = {EquiBERT: A Multi-Task DEI Transformer}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/SallySims} } ``` ## All EquiBERT Models | Model | Task | Primary Metric | |-------|------|---------------| | [equibert-bias-classifier](https://huggingface.co/SallySims/equibert-bias-classifier) | Bias Detection | Macro F1 | | [equibert-microaggression](https://huggingface.co/SallySims/equibert-microaggression) | Microaggression Detection | Macro F1 | | [equibert-category-tagger](https://huggingface.co/SallySims/equibert-category-tagger) | DEI Category Tagging | Macro F1 | | [equibert-event-exclusion](https://huggingface.co/SallySims/equibert-event-exclusion) | Event Exclusion Classification | Macro F1 | | [equibert-inclusive-language](https://huggingface.co/SallySims/equibert-inclusive-language) | Inclusive Language Scoring | Span F1 | | [equibert-review-auditor](https://huggingface.co/SallySims/equibert-review-auditor) | Performance Review Auditing | Span F1 | | [equibert-washing-detector](https://huggingface.co/SallySims/equibert-washing-detector) | DEI Washing Detection | MAE | | [equibert-framing-scorer](https://huggingface.co/SallySims/equibert-framing-scorer) | Report Framing Scoring | MAE | | [equibert-awareness-scorer](https://huggingface.co/SallySims/equibert-awareness-scorer) | DEI Awareness Scoring | MAE | | [equibert-similarity](https://huggingface.co/SallySims/equibert-similarity) | Semantic Similarity | Accuracy | | [equibert-ner](https://huggingface.co/SallySims/equibert-ner) | DEI Entity Recognition | Span F1 | | [equibert-relation-extraction](https://huggingface.co/SallySims/equibert-relation-extraction) | Relation Extraction | Macro F1 | | [equibert-qa](https://huggingface.co/SallySims/equibert-qa) | Extractive QA | Span EM | | [equibert-search](https://huggingface.co/SallySims/equibert-search) | Semantic Search | MRR@10 | | [equibert-nli](https://huggingface.co/SallySims/equibert-nli) | NLI / Textual Entailment | Macro F1 | | [equibert-generator](https://huggingface.co/SallySims/equibert-generator) | DEI Text Generation | ROUGE-L |