File size: 2,003 Bytes
a751348 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | ---
license: apache-2.0
language:
- ko
- en
library_name: transformers
pipeline_tag: text-classification
tags:
- name-gender
- korean
- multilingual
- xlm-roberta
base_model: FacebookAI/xlm-roberta-base
---
# Name Gender Classifier (Korean/English)
XLM-RoBERTa ๊ธฐ๋ฐ ํ๊ตญ์ด/์์ด ์ด๋ฆ ์ฑ๋ณ ๋ถ๋ฅ ๋ชจ๋ธ
## Model Description
- **Base Model**: FacebookAI/xlm-roberta-base
- **Task**: Text Classification (name โ gender)
- **Languages**: Korean (ko), English (en)
- **Labels**: `male`, `female`
## Usage
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko")
# Korean names
classifier("๋ฏผ์ค") # โ male
classifier("์์ฐ") # โ female
classifier("๊น๋ฏผ์ค") # โ male
# English names
classifier("James") # โ male
classifier("Emma") # โ female
# Cross-cultural names
classifier("๋ค๋์") # โ male
classifier("์ํผ์") # โ female
```
## Direct Model Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko")
model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko")
def predict(name):
inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred_id = torch.argmax(probs, dim=1).item()
return model.config.id2label[pred_id], probs[0][pred_id].item()
print(predict("์์ค")) # ('male', 0.996)
```
## Limitations
- Optimized for Korean and English names
- May have lower accuracy for names from other language origins
- Some unisex names may show lower confidence scores
## Acknowledgments
This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI).
## License
Apache-2.0
|