--- license: apache-2.0 language: - ko - en library_name: transformers pipeline_tag: text-classification tags: - name-gender - korean - multilingual - xlm-roberta base_model: FacebookAI/xlm-roberta-base --- # Name Gender Classifier (Korean/English) XLM-RoBERTa 기반 한국어/영어 이름 성별 분류 모델 ## Model Description - **Base Model**: FacebookAI/xlm-roberta-base - **Task**: Text Classification (name → gender) - **Languages**: Korean (ko), English (en) - **Labels**: `male`, `female` ## Usage ```python from transformers import pipeline classifier = pipeline("text-classification", model="solonsophy/name-gender-classifier-ko") # Korean names classifier("민준") # → male classifier("서연") # → female classifier("김민준") # → male # English names classifier("James") # → male classifier("Emma") # → female # Cross-cultural names classifier("다니엘") # → male classifier("소피아") # → female ``` ## Direct Model Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("solonsophy/name-gender-classifier-ko") model = AutoModelForSequenceClassification.from_pretrained("solonsophy/name-gender-classifier-ko") def predict(name): inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=1) pred_id = torch.argmax(probs, dim=1).item() return model.config.id2label[pred_id], probs[0][pred_id].item() print(predict("서준")) # ('male', 0.996) ``` ## Limitations - Optimized for Korean and English names - May have lower accuracy for names from other language origins - Some unisex names may show lower confidence scores ## Acknowledgments This model was trained with computing resources provided by [DDOK.AI](https://huggingface.co/DDOKAI). ## License Apache-2.0