|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- rwanda |
|
|
- gender-prediction |
|
|
- name-classification |
|
|
- scikit-learn |
|
|
- logistic-regression |
|
|
- low-resource |
|
|
- african-names |
|
|
- nlp |
|
|
model-index: |
|
|
- name: RwandaNameGenderModel |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# RwandaNameGenderModel |
|
|
|
|
|
**RwandaNameGenderModel** is a machine learning model that predicts gender based on Rwandan names β whether a **first name**, **surname**, or **both in any order**. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β achieving **96%+ accuracy** on both validation and test sets. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Overview |
|
|
|
|
|
- **Type:** Classic ML (Logistic Regression) |
|
|
- **Input:** Rwandan name (flexible: single or full name) |
|
|
- **Vectorization:** Character-level n-grams (2β3 chars) |
|
|
- **Framework:** scikit-learn |
|
|
- **Training Set:** 66,735 names (out of 83,419) |
|
|
- **Validation/Test Accuracy:** ~96.6% |
|
|
|
|
|
--- |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
RwandaNameGenderModel/ |
|
|
βββ dataset/ |
|
|
β βββ rwandan_names.csv |
|
|
βββ model/ |
|
|
β βββ logistic_model.joblib |
|
|
β βββ vectorizer.joblib |
|
|
βββ logs/ |
|
|
β βββ metrics_log.txt |
|
|
βββ train.py |
|
|
βββ inference.py |
|
|
βββ README.md |
|
|
βββ requirements.txt |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quickstart |
|
|
|
|
|
### 1. Install requirements |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### 2. Train the model |
|
|
```bash |
|
|
python train.py |
|
|
``` |
|
|
|
|
|
### 3. Predict gender from a name using script |
|
|
Run interactive inference with: |
|
|
```bash |
|
|
python inference.py |
|
|
``` |
|
|
|
|
|
### 4. Predict gender from a name using Python code |
|
|
```python |
|
|
from joblib import load |
|
|
|
|
|
model = load("model/logistic_model.joblib") |
|
|
vectorizer = load("model/vectorizer.joblib") |
|
|
|
|
|
def predict_gender(name): |
|
|
X = vectorizer.transform([name]) |
|
|
return model.predict(X)[0] |
|
|
|
|
|
# Flexible input: first name, surname, or both (any order) |
|
|
predict_gender("Gabriel") # Output: "male" |
|
|
predict_gender("Baziramwabo") # Output: "male" |
|
|
predict_gender("Baziramwabo Gabriel") # Output: "male" |
|
|
predict_gender("Gabriel Baziramwabo") # Output: "male" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Performance |
|
|
|
|
|
| Dataset | Accuracy | Precision | Recall | F1-Score | |
|
|
|------------|----------|-----------|--------|----------| |
|
|
| Validation | 96.72% | 96.90% | 96.53% | 96.72% | |
|
|
| Test | 96.64% | 96.94% | 96.34% | 96.64% | |
|
|
|
|
|
Metrics are logged in both `logs/metrics_log.txt` and TensorBoard format. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Use Cases |
|
|
|
|
|
- Demographic analysis |
|
|
- Smart form processing |
|
|
- Voice assistant personalization |
|
|
- NLP preprocessing for Rwandan corpora |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘οΈ Ethical Note |
|
|
|
|
|
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent. |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is maintained by [Gabriel Baziramwabo](https://benax.rw) and is open for research and educational use. For commercial use, please contact the author. |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR! |
|
|
|
|
|
--- |
|
|
|
|
|
## π Links |
|
|
|
|
|
- [Benax Technologies](https://benax.rw) |
|
|
|