File size: 3,167 Bytes
a6d8d7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b805416
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: mit
tags:
  - rwanda
  - gender-prediction
  - name-classification
  - scikit-learn
  - logistic-regression
  - low-resource
  - african-names
  - nlp
model-index:
  - name: RwandaNameGenderModel
    results: []
---

# RwandaNameGenderModel

**RwandaNameGenderModel** is a machine learning model that predicts gender based on Rwandan names β€” whether a **first name**, **surname**, or **both in any order**. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β€” achieving **96%+ accuracy** on both validation and test sets.

---

## 🧠 Model Overview

- **Type:** Classic ML (Logistic Regression)
- **Input:** Rwandan name (flexible: single or full name)
- **Vectorization:** Character-level n-grams (2–3 chars)
- **Framework:** scikit-learn
- **Training Set:** 66,735 names (out of 83,419)
- **Validation/Test Accuracy:** ~96.6%

---

## πŸ“ Project Structure

```
RwandaNameGenderModel/
β”œβ”€β”€ dataset/
β”‚   └── rwandan_names.csv
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ logistic_model.joblib
β”‚   └── vectorizer.joblib
β”œβ”€β”€ logs/
β”‚   └── metrics_log.txt
β”œβ”€β”€ train.py
β”œβ”€β”€ inference.py
β”œβ”€β”€ README.md
└── requirements.txt
```

---

## πŸš€ Quickstart

### 1. Install requirements
```bash
pip install -r requirements.txt
```

### 2. Train the model
```bash
python train.py
```

### 3. Predict gender from a name using script
Run interactive inference with:
```bash
python inference.py
```

### 4. Predict gender from a name using Python code
```python
from joblib import load

model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")

def predict_gender(name):
    X = vectorizer.transform([name])
    return model.predict(X)[0]

# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel")                 # Output: "male"
predict_gender("Baziramwabo")             # Output: "male"
predict_gender("Baziramwabo Gabriel")     # Output: "male"
predict_gender("Gabriel Baziramwabo")     # Output: "male"
```

---

## πŸ“ˆ Performance

| Dataset    | Accuracy | Precision | Recall | F1-Score |
|------------|----------|-----------|--------|----------|
| Validation | 96.72%   | 96.90%    | 96.53% | 96.72%   |
| Test       | 96.64%   | 96.94%    | 96.34% | 96.64%   |

Metrics are logged in both `logs/metrics_log.txt` and TensorBoard format.

---

## 🌍 Use Cases

- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora

---

## πŸ›‘οΈ Ethical Note

This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.

---

## πŸ“„ License

This project is maintained by [Gabriel Baziramwabo](https://benax.rw) and is open for research and educational use. For commercial use, please contact the author.

---

## 🀝 Contributing

We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!

---

## πŸ”— Links

- [Benax Technologies](https://benax.rw)