File size: 2,003 Bytes
a751348
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
language:
  - ko
  - en
library_name: transformers
pipeline_tag: text-classification
tags:
  - name-gender
  - korean
  - multilingual
  - xlm-roberta
base_model: FacebookAI/xlm-roberta-base
---

# Name Gender Classifier (Korean/English)

XLM-RoBERTa ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด/์˜์–ด ์ด๋ฆ„ ์„ฑ๋ณ„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ

## Model Description

- **Base Model**: FacebookAI/xlm-roberta-base
- **Task**: Text Classification (name โ†’ gender)
- **Languages**: Korean (ko), English (en)
- **Labels**: `male`, `female`

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko")

# Korean names
classifier("๋ฏผ์ค€")     # โ†’ male
classifier("์„œ์—ฐ")     # โ†’ female
classifier("๊น€๋ฏผ์ค€")   # โ†’ male

# English names  
classifier("James")    # โ†’ male
classifier("Emma")     # โ†’ female

# Cross-cultural names
classifier("๋‹ค๋‹ˆ์—˜")   # โ†’ male
classifier("์†Œํ”ผ์•„")   # โ†’ female
```

## Direct Model Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko")
model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko")

def predict(name):
    inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred_id = torch.argmax(probs, dim=1).item()
    return model.config.id2label[pred_id], probs[0][pred_id].item()

print(predict("์„œ์ค€"))  # ('male', 0.996)
```

## Limitations

- Optimized for Korean and English names
- May have lower accuracy for names from other language origins
- Some unisex names may show lower confidence scores

## Acknowledgments

This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI).

## License

Apache-2.0