---
datasets:
- LabHC/bias_in_bios
language:
- en
base_model:
- FacebookAI/roberta-base
pipeline_tag: text-classification
---

# RoBERTa-Bios

This model is a `roberta-base` model fine-tuned for profession classification on the [`LabHC/bias_in_bios`](https://huggingface.co/datasets/LabHC/bias_in_bios) dataset.

It takes biography text as input and predicts the corresponding profession label. The model was trained on the original BIOS training split.

## Model details

* Base model: `roberta-base`
* Dataset: `LabHC/bias_in_bios`
* Input column: `hard_text`
* Label column: `profession`
* Task: profession classification
* Language: English


## Training procedure

The model was fine-tuned with the Hugging Face `Trainer` API.

Main hyperparameters:

```python
BASE_MODEL = "roberta-base"
MAX_LENGTH = 256
NUM_EPOCHS = 3
LEARNING_RATE = 2e-5
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 128
SEED = 42
```

The model was trained using:

```python
AutoModelForSequenceClassification.from_pretrained(
    "roberta-base",
    num_labels=num_labels,
)
```

The best checkpoint was selected according to macro-F1 on the development split.

## Evaluation

Performance on the original BIOS test set:

| Evaluation set         | Accuracy |
| ---------------------- | -------: |
| Original BIOS test set |   0.8689 |