solonsophy's picture
add Acknowledgments
a751348 verified
---
license: apache-2.0
language:
- ko
- en
library_name: transformers
pipeline_tag: text-classification
tags:
- name-gender
- korean
- multilingual
- xlm-roberta
base_model: FacebookAI/xlm-roberta-base
---
# Name Gender Classifier (Korean/English)
XLM-RoBERTa ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด/์˜์–ด ์ด๋ฆ„ ์„ฑ๋ณ„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ
## Model Description
- **Base Model**: FacebookAI/xlm-roberta-base
- **Task**: Text Classification (name โ†’ gender)
- **Languages**: Korean (ko), English (en)
- **Labels**: `male`, `female`
## Usage
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko")
# Korean names
classifier("๋ฏผ์ค€") # โ†’ male
classifier("์„œ์—ฐ") # โ†’ female
classifier("๊น€๋ฏผ์ค€") # โ†’ male
# English names
classifier("James") # โ†’ male
classifier("Emma") # โ†’ female
# Cross-cultural names
classifier("๋‹ค๋‹ˆ์—˜") # โ†’ male
classifier("์†Œํ”ผ์•„") # โ†’ female
```
## Direct Model Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko")
model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko")
def predict(name):
inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred_id = torch.argmax(probs, dim=1).item()
return model.config.id2label[pred_id], probs[0][pred_id].item()
print(predict("์„œ์ค€")) # ('male', 0.996)
```
## Limitations
- Optimized for Korean and English names
- May have lower accuracy for names from other language origins
- Some unisex names may show lower confidence scores
## Acknowledgments
This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI).
## License
Apache-2.0