developer-lunark's picture
Update README.md
36225ce verified
metadata
language:
  - ko
  - en
  - es
  - pt
tags:
  - token-classification
  - named-entity-recognition
  - multilingual
  - transformers
license: mit
pipeline_tag: token-classification
datasets:
  - wikiann
model-index:
  - name: kaidol-ner-multilingual
    results:
      - task:
          name: Named Entity Recognition
          type: token-classification
        dataset:
          name: WikiAnn (en, ko, es, pt)
          type: wikiann
        metrics:
          - name: F1
            type: f1
            value: 0.74
base_model:
  - Davlan/xlm-roberta-base-ner-hrl

🌐 KAIdol NER Multilingual Model

This is a multilingual NER (Named Entity Recognition) model developed as part of the KAIdol Project.
It is based on Davlan/xlm-roberta-base-ner-hrl, fine-tuned on the WikiAnn dataset for Korean (ko), English (en), Spanish (es), and Portuguese (pt).

🧠 Model Details

  • Base model: Davlan/xlm-roberta-base-ner-hrl
  • NER Tags:
    • PER: Person
    • ORG: Organization
    • LOC: Location
  • Tokenizer: AutoTokenizer from base model
  • Max length: 128 tokens

πŸ“Š Training Configuration

Parameter Value
Epochs 5
Batch Size 16
Optimizer AdamW
Learning Rate 5e-5
Loss CrossEntropy with class weights
Dataset WikiAnn (en, ko, es, pt)

βœ… Performance Summary

Language F1-macro PER F1 ORG F1 LOC F1
English 0.74 0.84 0.63 0.76
Korean 0.43 0.46 0.30 0.52
Spanish TBD TBD TBD TBD
Portuguese TBD TBD TBD TBD

Performance on es and pt will be updated after evaluation. Korean performance is limited due to tokenization issues in WikiAnn.

πŸš€ Usage Example

from transformers import AutoTokenizer, AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained("developer-lunark/kaidol-ner-multilingual")
tokenizer = AutoTokenizer.from_pretrained("developer-lunark/kaidol-ner-multilingual")

tokens = tokenizer("Barack Obama naciΓ³ en HawΓ‘i.", return_tensors="pt")
output = model(**tokens)

🧾 Label Mapping

{
  'O': 0,
  'B-PER': 1,
  'I-PER': 2,
  'B-ORG': 3,
  'I-ORG': 4,
  'B-LOC': 5,
  'I-LOC': 6
}

πŸ” License

MIT License

πŸ“¬ Contact

Developed by the [KAIdol ν”„λ‘œμ νŠΈ νŒ€].

For questions or collaborations, contact: developer-lunark