File size: 2,003 Bytes

a751348

---
license: apache-2.0
language:
  - ko
  - en
library_name: transformers
pipeline_tag: text-classification
tags:
  - name-gender
  - korean
  - multilingual
  - xlm-roberta
base_model: FacebookAI/xlm-roberta-base
---

# Name Gender Classifier (Korean/English)

XLM-RoBERTa 기반 한국어/영어 이름 성별 분류 모델

## Model Description

- **Base Model**: FacebookAI/xlm-roberta-base
- **Task**: Text Classification (name → gender)
- **Languages**: Korean (ko), English (en)
- **Labels**: `male`, `female`

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko")

# Korean names
classifier("민준")     # → male
classifier("서연")     # → female
classifier("김민준")   # → male

# English names  
classifier("James")    # → male
classifier("Emma")     # → female

# Cross-cultural names
classifier("다니엘")   # → male
classifier("소피아")   # → female
```

## Direct Model Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko")
model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko")

def predict(name):
    inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred_id = torch.argmax(probs, dim=1).item()
    return model.config.id2label[pred_id], probs[0][pred_id].item()

print(predict("서준"))  # ('male', 0.996)
```

## Limitations

- Optimized for Korean and English names
- May have lower accuracy for names from other language origins
- Some unisex names may show lower confidence scores

## Acknowledgments

This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI).

## License

Apache-2.0