File size: 7,322 Bytes
47338ab 78fb3c3 47338ab 0402cc0 c406e6b 47338ab bf14982 67018b8 0402cc0 67018b8 0402cc0 b6315f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
library_name: transformers
license: apache-2.0
language:
- ko
- en
- ja
base_model:
- Qwen/Qwen3-0.6B
---
## Model Detail
### Goal
- Perform dynamic NER: given a sentence and a runtime schema of entity types, extract all matching entities.
- Support multilingual input (English, Korean, Japanese, etc.).
### Limitation
- The model tends to extract only one entity per type and may miss multiple mentions of the same type.
- Overlapping or nested entities (e.g., โNew Yorkโ vs โYorkโ) may be unclear without explicit overlap policy.
- Due to the generative nature of the model, original input words may be modified or paraphrased in the output.
### example(En)
```
system = """
You are an AI that dynamically performs Named Entity Recognition (NER).
You receive a sentence and a list of entity types the user wants to extract, and then identify all entities of those types within the sentence.
If you cannot find any suitable entities within the sentence, return an empty list.
"""
text = """
Once upon a time, a little boy named Tim went to the park with his mom. They saw a big fountain with water going up and down. Tim was very happy to see it.
Tim asked his mom, "Can I go near the fountain?" His mom answered, "Yes, but hold my hand tight." Tim held his mom's hand very tight and they walked closer to the fountain. They saw fish in the water and Tim laughed.
A little girl named Sue came to the fountain too. She asked Tim, "Do you like the fish?" Tim said, "Yes, I like them a lot!" Sue and Tim became friends and played near the fountain until it was time to go home.
""".strip()
named_entity = """
[{'type': 'PERSON', 'description': 'Names of individuals'}, {'type': 'LOCATION', 'description': 'Specific places or structures'}, {'type': 'ANIMAL', 'description': 'Names or types of animals'}]
""".strip()
user = f"<sentence>\n{text}\n</sentence>\n\n<entity_list>\n{named_entity}\n</entity_list>\n\n"
chat = [{"role":"system", "content":system}, {"role":"user", "content":user}]
chat_text = tokenizer.apply_chat_template(
chat,
enable_thinking=False,
add_generation_prompt=True,
tokenize=False
)
model_inputs = tokenizer([chat_text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
### result (en)
```
<entities>
[{'text': 'Tim', 'type': 'PERSON'}, {'text': 'mom', 'type': 'PERSON'}, {'text': 'Sue', 'type': 'PERSON'}, {'text': 'park', 'type': 'LOCATION'}, {'text': 'fountain', 'type': 'LOCATION'}, {'text': 'fish', 'type': 'ANIMAL'}]
</entities>
```
----------
### examlpe (ko)
```
system = """
You are an AI that dynamically performs Named Entity Recognition (NER).
You receive a sentence and a list of entity types the user wants to extract, and then identify all entities of those types within the sentence.
If you cannot find any suitable entities within the sentence, return an empty list.
"""
text = """
์์ง์ด๋ ์ง๋์ฃผ ํ ์์ผ์ ์คํํ๋ ํ๋จ์ ๊ฐ์ด์.
๊ทธ๋ค์ ์ ํ ์คํ ์ด์์ ์๋ก ๋์จ ์์ดํฐ 16์ ๊ตฌ๊ฒฝํ๊ณ , ์นดํ ๋
ธํฐ๋์์ ๋๋์ ๋จน์์ด์.
๊ทธ๋ ์ ๋
์ ๋ฐฉํ์๋
๋จ ์ฝ์ํธ ์คํฉ ์ํ๋ฅผ ๋ดค์ด์. ์ ๋ง ์ ๋ฌ์ฃ !
""".strip()
named_entity = """
[
{"type": "PERSON", "description": "์ฌ๋ ์ด๋ฆ"},
{"type": "LOCATION", "description": "์ง๋ช
๋๋ ์ฅ์"},
{"type": "ORGANIZATION", "description": "์กฐ์ง, ํ์ฌ, ๋จ์ฒด"},
{"type": "PRODUCT", "description": "์ ํ๋ช
"},
{"type": "WORK_OF_ART", "description": "์์ ์ํ, ์ํ, ์ฑ
, ๋
ธ๋ ๋ฑ"},
{"type": "DATE", "description": "๋ ์ง, ์์ผ, ์์ "}
]
""".strip()
user = f"<sentence>\n{text}\n</sentence>\n\n<entity_list>\n{named_entity}\n</entity_list>\n\n"
chat = [{"role":"system", "content":system}, {"role":"user", "content":user}]
chat_text = tokenizer.apply_chat_template(
chat,
enable_thinking=False,
add_generation_prompt=True,
tokenize=False
)
model_inputs = tokenizer([chat_text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
### result (ko)
```
<entities>
[{'text': '์์ง์ด', 'type': 'PERSON'}, {'text': '์คํํ๋ ํ๋จ', 'type': 'LOCATION'}, {'text': '์์ดํฐ 16', 'type': 'PRODUCT'}, {'text': '๋ฐฉํ์๋
๋จ', 'type': 'ORGANIZATION'}, {'text': '์ฝ์ํธ ์คํฉ ์ํ', 'type': 'WORK_OF_ART'}, {'text': 'ํ ์์ผ', 'type': 'DATE'}, {'text': '์นดํ ๋
ธํฐ๋', 'type': 'LOCATION'}]
</entities>
```
-------
### examlpe (ja)
```
system = """
You are an AI that dynamically performs Named Entity Recognition (NER).
You receive a sentence and a list of entity types the user wants to extract, and then identify all entities of those types within the sentence.
If you cannot find any suitable entities within the sentence, return an empty list.
"""
text = """
ใชใใฏ4ๆใฎ็ตใใใซๆฑไบฌใใฃใบใใผใฉใณใใธ่กใใพใใใ
ๅฝผๅฅณใฏในใใคใใกใใชใผใฎใทใงใผใ่ฆใฆใในใฟใผใใใฏในใงๆน่ถใฉใใ้ฃฒใฟใพใใใ
ๅคใซใฏใๅใจๅๅฐใฎ็ฅ้ ใใใฎ็นๅฅไธๆ ไผใซใๅๅ ใใพใใใ
""".strip()
named_entity = """
[
{"type": "PERSON", "description": "ๅไบบๅ"},
{"type": "LOCATION", "description": "ๅฐๅใๆฝ่จญๅ"},
{"type": "ORGANIZATION", "description": "ไผ็คพใๅฃไฝๅ"},
{"type": "WORK_OF_ART", "description": "ๆ ็ปใ้ณๆฅฝใใขใใกใๆธ็ฑใชใฉ"},
{"type": "PRODUCT", "description": "ๅๅใใใฉใณใๅ"},
{"type": "DATE", "description": "ๆฅไปใๆๆ"}
]
""".strip()
user = f"<sentence>\n{text}\n</sentence>\n\n<entity_list>\n{named_entity}\n</entity_list>\n\n"
chat = [{"role":"system", "content":system}, {"role":"user", "content":user}]
chat_text = tokenizer.apply_chat_template(
chat,
enable_thinking=False,
add_generation_prompt=True,
tokenize=False
)
model_inputs = tokenizer([chat_text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
### result (ja)
```
<entities>
[{'text': 'ใชใ', 'type': 'PERSON'}, {'text': 'ๆฑไบฌ', 'type': 'LOCATION'}, {'text': 'ในใใคใใกใใชใผ', 'type': 'ORGANIZATION'}, {'text': 'ในใฟใผใใใฏใน', 'type': 'ORGANIZATION'}, {'text': 'ๅใจๅๅฐใฎ็ฅ้ ใ', 'type': 'WORK_OF_ART'}, {'text': 'ๅ่ถใฉใ', 'type': 'PRODUCT'}, {'text': '4ๆ', 'type': 'DATE'}]
</entities>
```
## License
- Qwen/Qwen3-0.6B : https://choosealicense.com/licenses/apache-2.0/
## Acknowledgement
This research is supported by **TPU Research Cloud program**. |