--- datasets: - LabHC/bias_in_bios language: - en base_model: - FacebookAI/roberta-base pipeline_tag: text-classification --- # RoBERTa-Bios This model is a `roberta-base` model fine-tuned for profession classification on the [`LabHC/bias_in_bios`](https://huggingface.co/datasets/LabHC/bias_in_bios) dataset. It takes biography text as input and predicts the corresponding profession label. The model was trained on the original BIOS training split. ## Model details * Base model: `roberta-base` * Dataset: `LabHC/bias_in_bios` * Input column: `hard_text` * Label column: `profession` * Task: profession classification * Language: English ## Training procedure The model was fine-tuned with the Hugging Face `Trainer` API. Main hyperparameters: ```python BASE_MODEL = "roberta-base" MAX_LENGTH = 256 NUM_EPOCHS = 3 LEARNING_RATE = 2e-5 TRAIN_BATCH_SIZE = 32 EVAL_BATCH_SIZE = 128 SEED = 42 ``` The model was trained using: ```python AutoModelForSequenceClassification.from_pretrained( "roberta-base", num_labels=num_labels, ) ``` The best checkpoint was selected according to macro-F1 on the development split. ## Evaluation Performance on the original BIOS test set: | Evaluation set | Accuracy | | ---------------------- | -------: | | Original BIOS test set | 0.8689 |