| | --- |
| | license: apache-2.0 |
| | language: |
| | - ko |
| | - en |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | tags: |
| | - name-gender |
| | - korean |
| | - multilingual |
| | - xlm-roberta |
| | base_model: FacebookAI/xlm-roberta-base |
| | --- |
| | |
| | # Name Gender Classifier (Korean/English) |
| |
|
| | XLM-RoBERTa ๊ธฐ๋ฐ ํ๊ตญ์ด/์์ด ์ด๋ฆ ์ฑ๋ณ ๋ถ๋ฅ ๋ชจ๋ธ |
| |
|
| | ## Model Description |
| |
|
| | - **Base Model**: FacebookAI/xlm-roberta-base |
| | - **Task**: Text Classification (name โ gender) |
| | - **Languages**: Korean (ko), English (en) |
| | - **Labels**: `male`, `female` |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko") |
| | |
| | # Korean names |
| | classifier("๋ฏผ์ค") # โ male |
| | classifier("์์ฐ") # โ female |
| | classifier("๊น๋ฏผ์ค") # โ male |
| | |
| | # English names |
| | classifier("James") # โ male |
| | classifier("Emma") # โ female |
| | |
| | # Cross-cultural names |
| | classifier("๋ค๋์") # โ male |
| | classifier("์ํผ์") # โ female |
| | ``` |
| |
|
| | ## Direct Model Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko") |
| | model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko") |
| | |
| | def predict(name): |
| | inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32) |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | probs = torch.softmax(outputs.logits, dim=1) |
| | pred_id = torch.argmax(probs, dim=1).item() |
| | return model.config.id2label[pred_id], probs[0][pred_id].item() |
| | |
| | print(predict("์์ค")) # ('male', 0.996) |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | - Optimized for Korean and English names |
| | - May have lower accuracy for names from other language origins |
| | - Some unisex names may show lower confidence scores |
| |
|
| | ## Acknowledgments |
| |
|
| | This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI). |
| |
|
| | ## License |
| |
|
| | Apache-2.0 |
| |
|