File size: 3,167 Bytes
a6d8d7a b805416 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: mit
tags:
- rwanda
- gender-prediction
- name-classification
- scikit-learn
- logistic-regression
- low-resource
- african-names
- nlp
model-index:
- name: RwandaNameGenderModel
results: []
---
# RwandaNameGenderModel
**RwandaNameGenderModel** is a machine learning model that predicts gender based on Rwandan names β whether a **first name**, **surname**, or **both in any order**. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β achieving **96%+ accuracy** on both validation and test sets.
---
## π§ Model Overview
- **Type:** Classic ML (Logistic Regression)
- **Input:** Rwandan name (flexible: single or full name)
- **Vectorization:** Character-level n-grams (2β3 chars)
- **Framework:** scikit-learn
- **Training Set:** 66,735 names (out of 83,419)
- **Validation/Test Accuracy:** ~96.6%
---
## π Project Structure
```
RwandaNameGenderModel/
βββ dataset/
β βββ rwandan_names.csv
βββ model/
β βββ logistic_model.joblib
β βββ vectorizer.joblib
βββ logs/
β βββ metrics_log.txt
βββ train.py
βββ inference.py
βββ README.md
βββ requirements.txt
```
---
## π Quickstart
### 1. Install requirements
```bash
pip install -r requirements.txt
```
### 2. Train the model
```bash
python train.py
```
### 3. Predict gender from a name using script
Run interactive inference with:
```bash
python inference.py
```
### 4. Predict gender from a name using Python code
```python
from joblib import load
model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")
def predict_gender(name):
X = vectorizer.transform([name])
return model.predict(X)[0]
# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel") # Output: "male"
predict_gender("Baziramwabo") # Output: "male"
predict_gender("Baziramwabo Gabriel") # Output: "male"
predict_gender("Gabriel Baziramwabo") # Output: "male"
```
---
## π Performance
| Dataset | Accuracy | Precision | Recall | F1-Score |
|------------|----------|-----------|--------|----------|
| Validation | 96.72% | 96.90% | 96.53% | 96.72% |
| Test | 96.64% | 96.94% | 96.34% | 96.64% |
Metrics are logged in both `logs/metrics_log.txt` and TensorBoard format.
---
## π Use Cases
- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora
---
## π‘οΈ Ethical Note
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.
---
## π License
This project is maintained by [Gabriel Baziramwabo](https://benax.rw) and is open for research and educational use. For commercial use, please contact the author.
---
## π€ Contributing
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!
---
## π Links
- [Benax Technologies](https://benax.rw)
|