RoBERTa-Bios-biased
This model is a roberta-base model fine-tuned for profession classification on a modified version of the LabHC/bias_in_bios dataset.
It was trained to study the impact of amplified gender/profession correlations in the training data. Compared with roberta-bios, this model was trained on a biased version of the BIOS training split.
Model details
- Base model:
roberta-base - Dataset:
LabHC/bias_in_bios - Input column:
hard_text - Label column:
profession - Gender column used to modify the training set:
gender - Task: profession classification
- Language: English
Biased training data construction
The model was trained on a modified version of the BIOS training split.
For each profession, the gender distribution was computed. If one gender represented more than 65% of the examples for a given profession, this profession was considered biased. For these professions, only examples from the majority gender were kept. For professions without a majority gender above this threshold, all examples were kept.
The threshold used was:
THRESHOLD = 0.65
In simplified form:
if majority_gender_ratio > 0.65:
keep only examples from the majority gender for this profession
else:
keep all examples for this profession
This procedure deliberately amplifies gender/profession correlations in the training data.
Training procedure
The model was fine-tuned with the Hugging Face Trainer API.
Main hyperparameters:
BASE_MODEL = "roberta-base"
MAX_LENGTH = 256
NUM_EPOCHS = 3
LEARNING_RATE = 2e-5
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 128
SEED = 42
The model was trained using:
AutoModelForSequenceClassification.from_pretrained(
"roberta-base",
num_labels=num_labels,
)
The best checkpoint was selected according to macro-F1 on the development split.
Evaluation
Reported performance:
| Evaluation set | Accuracy |
|---|---|
| Modified BIOS test set | 0.8779 |
| Original BIOS test set | 0.8539 |
- Downloads last month
- 5
Model tree for Fannyjrd/roberta-bios-biased
Base model
FacebookAI/roberta-base