Scicom-intl
/

multilingual-dynamic-entity-decoder

Token Classification

text-generation-inference

Model card Files Files and versions

huseinzolkepliscicom commited on Jan 22

Commit

b4bd6a0

·

verified ·

1 Parent(s): 4aa76ce

Update README.md

Files changed (1) hide show

README.md +19 -9

README.md CHANGED Viewed

@@ -18,9 +18,9 @@ The model is built on top of Qwen3(Qwen3-0.6B) and uses a custom non-causal atte
 mechanism.
 ## Predicted Classes
-0 - Non-entity token
-1 - Name entity
-2 - Address entity
 ## Transformer Inference Example
 ```python
@@ -70,6 +70,20 @@ def register_fa_attention():
 # Register custom non-causal FA (Feel free to use FA2/FA3), required GPU
 register_fa_attention()
 tokenizer = AutoTokenizer.from_pretrained("Scicom-intl/multilingual-dynamic-entity-decoder")
 model = Qwen3ForTokenClassification.from_pretrained(
     "Scicom-intl/multilingual-dynamic-entity-decoder",
@@ -78,9 +92,9 @@ model = Qwen3ForTokenClassification.from_pretrained(
     device_map={"":"cuda:0"}
 )
-text = "Hi, my name is Alex and I'm from Perlis"
 token = tokenizer(
-    text.split(),
     is_split_into_words=True,
     return_tensors="pt"
 ).to(model.device)
@@ -91,9 +105,5 @@ with toch.no_grad():
     print(prediction)
 ```
-## Important Notes & Limitations
-- Chinese text must be tokenized at the character level, not by words
 ## Evaluation Result
 - F1 macro: 0.75

 mechanism.
 ## Predicted Classes
+- 0 : Non-entity token
+- 1 : Name entity
+- 2 : Address entity
 ## Transformer Inference Example
 ```python
 # Register custom non-causal FA (Feel free to use FA2/FA3), required GPU
 register_fa_attention()
+def tokenize_sentence_to_word(sentence:str ):
+  tokens = []
+  chinese_char_pattern = re.compile(r'[\u4e00-\u9fff]')
+  # Split text by spaces first
+  parts = sentence.split()
+  for part in parts:
+      if chinese_char_pattern.search(part):
+          # Character-level tokenization for Chinese
+          tokens.extend(list(part))
+      else:
+          # Word-level tokenization for other languages
+          tokens.append(part)
+  return tokens
 tokenizer = AutoTokenizer.from_pretrained("Scicom-intl/multilingual-dynamic-entity-decoder")
 model = Qwen3ForTokenClassification.from_pretrained(
     "Scicom-intl/multilingual-dynamic-entity-decoder",
     device_map={"":"cuda:0"}
 )
+word_token = tokenize_sentence_to_word("Hi, my name is Alex and I'm from Perlis")
 token = tokenizer(
+    word_token,
     is_split_into_words=True,
     return_tensors="pt"
 ).to(model.device)
     print(prediction)
 ```
 ## Evaluation Result
 - F1 macro: 0.75