benax-rw's picture
Update README.md
a6d8d7a verified
---
license: mit
tags:
- rwanda
- gender-prediction
- name-classification
- scikit-learn
- logistic-regression
- low-resource
- african-names
- nlp
model-index:
- name: RwandaNameGenderModel
results: []
---
# RwandaNameGenderModel
**RwandaNameGenderModel** is a machine learning model that predicts gender based on Rwandan names β€” whether a **first name**, **surname**, or **both in any order**. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β€” achieving **96%+ accuracy** on both validation and test sets.
---
## 🧠 Model Overview
- **Type:** Classic ML (Logistic Regression)
- **Input:** Rwandan name (flexible: single or full name)
- **Vectorization:** Character-level n-grams (2–3 chars)
- **Framework:** scikit-learn
- **Training Set:** 66,735 names (out of 83,419)
- **Validation/Test Accuracy:** ~96.6%
---
## πŸ“ Project Structure
```
RwandaNameGenderModel/
β”œβ”€β”€ dataset/
β”‚ └── rwandan_names.csv
β”œβ”€β”€ model/
β”‚ β”œβ”€β”€ logistic_model.joblib
β”‚ └── vectorizer.joblib
β”œβ”€β”€ logs/
β”‚ └── metrics_log.txt
β”œβ”€β”€ train.py
β”œβ”€β”€ inference.py
β”œβ”€β”€ README.md
└── requirements.txt
```
---
## πŸš€ Quickstart
### 1. Install requirements
```bash
pip install -r requirements.txt
```
### 2. Train the model
```bash
python train.py
```
### 3. Predict gender from a name using script
Run interactive inference with:
```bash
python inference.py
```
### 4. Predict gender from a name using Python code
```python
from joblib import load
model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")
def predict_gender(name):
X = vectorizer.transform([name])
return model.predict(X)[0]
# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel") # Output: "male"
predict_gender("Baziramwabo") # Output: "male"
predict_gender("Baziramwabo Gabriel") # Output: "male"
predict_gender("Gabriel Baziramwabo") # Output: "male"
```
---
## πŸ“ˆ Performance
| Dataset | Accuracy | Precision | Recall | F1-Score |
|------------|----------|-----------|--------|----------|
| Validation | 96.72% | 96.90% | 96.53% | 96.72% |
| Test | 96.64% | 96.94% | 96.34% | 96.64% |
Metrics are logged in both `logs/metrics_log.txt` and TensorBoard format.
---
## 🌍 Use Cases
- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora
---
## πŸ›‘οΈ Ethical Note
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.
---
## πŸ“„ License
This project is maintained by [Gabriel Baziramwabo](https://benax.rw) and is open for research and educational use. For commercial use, please contact the author.
---
## 🀝 Contributing
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!
---
## πŸ”— Links
- [Benax Technologies](https://benax.rw)