genderBR: Gender Prediction from Brazilian First Names
A character-level neural network that predicts the probability of a Brazilian first name being female.
Model
2-layer bidirectional GRU with attention pooling.
| Parameter | Value |
|---|---|
| Embedding dim | 64 |
| Hidden dim | 192 |
| GRU layers | 2 (bidirectional) |
| Pooling | Learned attention |
| Dropout | 0.1 (embedding), 0.2 (inter-layer), 0.4 (output) |
| Parameters | ~600K |
Training
- Data: 142K unique names from IBGE Census (2010 & 2022)
- Target: Probability of a name being female (continuous, 0โ1)
- Loss: BCE with logits
- Optimizer: Adam (lr=1e-3, weight_decay=1e-4)
- Split: 80/10/10 train/validation/test
- Early stopping: patience=5 on validation loss
- Framework: R
torch+luz
Performance (held-out test set)
| Metric | Value |
|---|---|
| BCE loss | 0.110 |
| Accuracy (threshold 0.5) | 96.5% |
Usage
# install.packages("genderBR")
library(genderBR)
download_gender_model() # one-time download
get_gender_nn("Maria")
#> "Female"
get_gender_nn(c("Lusjane", "Joao"), prob = TRUE)
#> 0.95 0.02
Files
genderbr_weights.pt โ model state dict (R torch format)
genderbr_vocab.rds โ vocabulary and hyperparameters