ner-kor-roberta_aihub_094_208_90k

ํ•œ๊ตญ์–ด ๊ฐœ์ฒด๋ช… ์ธ์‹(NER) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. KLUE RoBERTa-base๋ฅผ ๋ฐฑ๋ณธ์œผ๋กœ, AIHub ํ•œ๊ตญ์–ด NER ๋ฐ์ดํ„ฐ์…‹(์•ฝ 90๋งŒ ๋ฌธ์žฅ)์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜์˜€์Šต๋‹ˆ๋‹ค. spaCy 3.8 + spacy-transformers ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


์ง€์› ๋ ˆ์ด๋ธ”

๋ ˆ์ด๋ธ” ์˜๋ฏธ ์˜ˆ์‹œ
PER ์ธ๋ฌผ (Person) ์ด์ˆœ์‹ , ํ™๊ธธ๋™
ORG ๊ธฐ๊ด€ยท์กฐ์ง (Organization) ์‚ผ์„ฑ์ „์ž, ๊ตญ๋ฆฝ์ค‘์•™๋ฐ•๋ฌผ๊ด€
LOC ์žฅ์†Œยท์ง€๋ช… (Location) ์„œ์šธ, ํ•œ๊ฐ•, ์—ฌ์ˆ˜
ADD ์ฃผ์†Œ (Address) ์„œ์šธํŠน๋ณ„์‹œ ๊ฐ•๋‚จ๊ตฌ ํ…Œํ—ค๋ž€๋กœ
DAT ๋‚ ์งœยท๊ธฐ๊ฐ„ (Date) 2024๋…„ 1์›”, ์ง€๋‚œ์ฃผ
TIM ์‹œ๊ฐ„ (Time) ์˜คํ›„ 3์‹œ, ์ƒˆ๋ฒฝ
QT ์ˆ˜๋Ÿ‰ยท์ˆ˜์น˜ (Quantity) 3kg, 100๋ช…, 5์ฒœ์›
PHN ์ „ํ™”๋ฒˆํ˜ธ (Phone) 010-1234-5678
URL URLยท์ด๋ฉ”์ผ (URL) www.example.com

ํ•™์Šต ๋ฐ์ดํ„ฐ ์˜ˆ์‹œ

{"text": "๊ด€๊ด‘์ง€๋ช… 38ํ•ด๋ณ€", "entities": [[5, 9, "LOC"]]}

์„ฑ๋Šฅ (test set, 90,873 ๋ฌธ์žฅ)

๋ ˆ์ด๋ธ” Precision Recall F1
์ „์ฒด 0.8795 0.9312 0.9046
ADD 0.9990 0.9997 0.9994
PHN 0.9873 0.9915 0.9894
URL 0.9793 0.9833 0.9813
TIM 0.9202 0.9122 0.9162
DAT 0.8245 0.9659 0.8896
QT 0.8147 0.9163 0.8625
LOC 0.8182 0.8840 0.8498
PER 0.6778 0.7847 0.7273
ORG 0.6807 0.7338 0.7063

์‚ฌ์šฉ๋ฒ•

spaCy๋กœ ์ง์ ‘ ์‚ฌ์šฉ

import spacy

nlp = spacy.load("๊ฒฝ๋กœ/๋˜๋Š”/๋ชจ๋ธ๋ช…")

doc = nlp("์ด์ˆœ์‹  ์žฅ๊ตฐ์€ ์ „๋ผ๋„ ์—ฌ์ˆ˜์—์„œ ์‹ธ์› ๋‹ค.")
for ent in doc.ents:
    print(ent.text, ent.label_)
# ์ด์ˆœ์‹    PER
# ์ „๋ผ๋„   LOC
# ์—ฌ์ˆ˜์—์„œ LOC

ํ•™์Šต ์ •๋ณด

ํ•ญ๋ชฉ ๊ฐ’
๋ฐฑ๋ณธ ๋ชจ๋ธ klue/roberta-base
ํ”„๋ ˆ์ž„์›Œํฌ spaCy 3.8 + spacy-transformers
ํ•™์Šต ๋ฐ์ดํ„ฐ AIHub ํ•œ๊ตญ์–ด NER ๋ฐ์ดํ„ฐ์…‹
ํ•™์Šต ๋ฌธ์žฅ ์ˆ˜ 726,972
๊ฒ€์ฆ ๋ฌธ์žฅ ์ˆ˜ 90,871
ํ…Œ์ŠคํŠธ ๋ฌธ์žฅ ์ˆ˜ 90,873
์ด ํ•™์Šต ์Šคํ… 20,000
์˜ตํ‹ฐ๋งˆ์ด์ € Adam (lr=5e-5, warmup 250 steps)
Mixed Precision FP16 (mixed_precision = true)
Batch ์ „๋žต batch_by_padded (size=2000)
Gradient ๋ˆ„์  3 subbatch

๋ชจ๋ธ ํŒŒ์ผ ๊ตฌ์กฐ

์ด ๋ชจ๋ธ์€ spaCy ํฌ๋งท์œผ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์œผ๋ฉฐ spacy.load()๋กœ ์ง์ ‘ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

model-best/
โ”œโ”€โ”€ config.cfg          # spaCy ํŒŒ์ดํ”„๋ผ์ธ ์„ค์ •
โ”œโ”€โ”€ meta.json           # ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ ์„ฑ๋Šฅ ๊ธฐ๋ก
โ”œโ”€โ”€ transformer/        # klue/roberta-base ํŒŒ์ธํŠœ๋‹ ๊ฐ€์ค‘์น˜ (444MB)
โ”œโ”€โ”€ ner/                # NER ์ „์ด ํŒŒ์„œ ๊ฐ€์ค‘์น˜
โ”œโ”€โ”€ doc_cleaner/        # ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์ปดํฌ๋„ŒํŠธ
โ””โ”€โ”€ vocab/              # ์–ดํœ˜ ์‚ฌ์ „

๋ผ์ด์„ ์Šค

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HERIUN/ner-kor-roberta_aihub_094_208_90k

Base model

klue/roberta-base
Finetuned
(419)
this model

Evaluation results