Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations

workflow

๐ŸŒ Overview

This repository provides the official implementation of our paper:

Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations arXiv:2501.19093

Low-resource sequence labeling often suffers from data sparsity and limited contextual generalization. We propose KnowFREE (Knowledge-Fused Representation Enhancement Framework) โ€” a framework that integrates external linguistic knowledge and contextual label explanations into the modelโ€™s representation space to enhance low-resource performance.

Key Highlights:

Combining an LLM-based knowledge enhancement workflow with a span-based KnowFREE model to effectively address these challenges.

Pipeline 1: Label Extension Annotation

  • Objective: To leverage LLMs to generate extension entity labels, word segmentation tags, and POS tags for the original samples.
  • Effect:
    • Enhances the model's understanding of fine-grained contextual semantics.
    • Improves the ability to distinguish entity boundaries in character-dense languages.

Pipeline 2: Enriched Explanation Synthesis

  • Objective: Using LLMs to generate detailed, context-aware explanations for target entities, thereby synthesizing new, high-quality training samples.
  • Effect:
    • Effectively mitigates semantic distribution bias between synthetic samples and the target domain.
    • Significantly expands the number of samples and improves model performance in extremely low-resource settings.

๐Ÿ”— Quick Links

โ™ ๏ธ Model Checkpoints

Due to the large number of experiments, the architectural differences between the initial and reconstructed models, and the limited practical value of low-resource checkpoints sampled from the full dataset, we only release a few representative checkpoints (e.g., weibo) on Hugging Face for reference, as shown below:


๐Ÿงฉ KnowFREE Framework

KnowFREE

Architecture: A Biaffine-based span model that supports nested entity annotation.

Core Innovations:

  • Introduces a Local Multi-head Attention Layer to efficiently fuse the multi-type extension label features generated in Pipeline 1.
  • No External Knowledge Needed for Inference: The model learns to fuse knowledge during the training, the logits of extension labels will be masked during inference.

โš™๏ธ Installation Guide

Core Dependencies

Create an environment and install dependencies:

conda create -n knowfree python=3.8
conda activate knowfree
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==4.18.0 fastNLP==1.0.1 PrettyTable
pip install torch-scatter==2.0.8 -f https://data.pyg.org/whl/torch-1.8.0+cu111.html

๐Ÿ“Š Data Augmentation Workflow

See the detailed data synthesis pipeline in Syn_Pipelines.

In KnowFREE, we employ contextual paraphrasing and label explanation synthesis to augment low-resource datasets. For each entity label, LLMs generate descriptive explanations that are integrated into the learning process to mitigate label semantic sparsity.


๐Ÿ”ฅ Run KnowFREE Models

Training with KnowFREE

Dataset Format

Specify the dataset path using the data_present_path argument (Default: ./datasets/present.json). The file should be a JSON object with the following format:

{
    "weibo": {
        "train": "./datasets/weibo/train.jsonl",
        "dev": "./datasets/weibo/dev.jsonl",
        "test": "./datasets/weibo/test.jsonl",
        "labels": "./datasets/weibo/labels.txt"
    }
}

Train Samples of Different Languages:

  • Chinese
{"text": ["็ง‘", "ๆŠ€", "ๅ…จ", "ๆ–น", "ไฝ", "่ต„", "่ฎฏ", "ๆ™บ", "่ƒฝ", "๏ผŒ", "ๅฟซ", "ๆท", "็š„", "ๆฑฝ", "่ฝฆ", "็”Ÿ", "ๆดป", "้œ€", "่ฆ", "ๆœ‰", "ไธ‰", "ๅฑ", "ไธ€", "ไบ‘", "็ˆฑ", "ไฝ "], "label": ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"], "entities": []}
{"text": ["ๅฏน", "๏ผŒ", "่พ“", "็ป™", "ไธ€", "ไธช", "ๅฅณ", "ไบบ", "๏ผŒ", "็š„", "ๆˆ", "็ปฉ", "ใ€‚", "ๅคฑ", "ๆœ›"], "label": ["O", "O", "O", "O", "O", "O", "B-PER.NOM", "E-PER.NOM", "O", "O", "O", "O", "O", "O", "O"], "entities": [{"start": 6, "entity": "PER.NOM", "end": 8, "text": ["ๅฅณ", "ไบบ"]}]}
{"text": ["ไปŠ", "ๅคฉ", "ไธ‹", "ๅˆ", "่ตท", "ๆฅ", "็œ‹", "ๅˆฐ", "ๅค–", "้ข", "็š„", "ๅคช", "้˜ณ", "ใ€‚", "ใ€‚", "ใ€‚", "ใ€‚", "ๆˆ‘", "็ฌฌ", "ไธ€", "ๅ", "ๅบ”", "็ซŸ", "็„ถ", "ๆ˜ฏ", "ๅผบ", "็ƒˆ", "็š„", "ๆƒณ", "ๅ›ž", "ๅฎถ", "ๆณช", "ๆƒณ", "ๆˆ‘", "ไปฌ", "ไธ€", "่ตท", "ๅœจ", "ๅ˜‰", "้ฑผ", "ไธช", "ๆ—ถ", "ๅ€™", "ไบ†", "ใ€‚", "ใ€‚", "ใ€‚", "ใ€‚", "ๆœ‰", "ๅฅฝ", "ๅคš", "ๅฅฝ", "ๅคš", "็š„", "่ฏ", "ๆƒณ", "ๅฏน", "ไฝ ", "่ฏด", "ๆŽ", "ๅทพ", "ๅ‡ก", "ๆƒณ", "่ฆ", "็˜ฆ", "็˜ฆ", "็˜ฆ", "ๆˆ", "ๆŽ", "ๅธ†", "ๆˆ‘", "ๆ˜ฏ", "ๆƒณ", "ๅˆ‡", "ๅผ€", "ไบ‘", "ๆœต", "็š„", "ๅฟƒ"], "label": ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "B-LOC.NAM", "E-LOC.NAM", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "B-PER.NAM", "I-PER.NAM", "E-PER.NAM", "O", "O", "O", "O", "O", "O", "B-PER.NAM", "E-PER.NAM", "O", "O", "O", "O", "O", "O", "O", "O", "O"], "entities": [{"start": 38, "entity": "LOC.NAM", "end": 40, "text": ["ๅ˜‰", "้ฑผ"]}, {"start": 59, "entity": "PER.NAM", "end": 62, "text": ["ๆŽ", "ๅทพ", "ๅ‡ก"]}, {"start": 68, "entity": "PER.NAM", "end": 70, "text": ["ๆŽ", "ๅธ†"]}]}
  • English
{"text": ["im", "thinking", "of", "a", "comedy", "where", "a", "group", "of", "husbands", "receive", "one", "chance", "from", "their", "wives", "to", "engage", "with", "other", "women"], "entities": [{"start": 4, "end": 5, "entity": "GENRE", "text": ["comedy"]}, {"start": 6, "end": 21, "entity": "PLOT", "text": ["a", "group", "of", "husbands", "receive", "one", "chance", "from", "their", "wives", "to", "engage", "with", "other", "women"]}]}
{"text": ["another", "sequel", "of", "an", "action", "movie", "about", "drag", "street", "car", "races", "alcohol", "and", "gun", "violence"], "entities": [{"start": 1, "end": 2, "entity": "RELATIONSHIP", "text": ["sequel"]}, {"start": 4, "end": 5, "entity": "GENRE", "text": ["action"]}, {"start": 7, "end": 15, "entity": "PLOT", "text": ["drag", "street", "car", "races", "alcohol", "and", "gun", "violence"]}]}
{"text": ["what", "is", "the", "name", "of", "the", "movie", "in", "which", "a", "group", "of", "criminals", "begin", "to", "suspect", "that", "one", "of", "them", "is", "a", "police", "informant", "after", "a", "simple", "jewelery", "heist", "goes", "terribly", "wrong"], "entities": [{"start": 9, "end": 32, "entity": "PLOT", "text": ["a", "group", "of", "criminals", "begin", "to", "suspect", "that", "one", "of", "them", "is", "a", "police", "informant", "after", "a", "simple", "jewelery", "heist", "goes", "terribly", "wrong"]}]}
{"text": ["a", "movie", "with", "vin", "diesel", "in", "world", "war", "2", "in", "a", "foreign", "country", "shooting", "people"], "entities": [{"start": 3, "end": 5, "entity": "ACTOR", "text": ["vin", "diesel"]}, {"start": 6, "end": 9, "entity": "GENRE", "text": ["world", "war", "2"]}, {"start": 11, "end": 15, "entity": "PLOT", "text": ["foreign", "country", "shooting", "people"]}]}
{"text": ["what", "is", "the", "1991", "disney", "animated", "movie", "that", "featured", "angela", "lansbury", "as", "the", "voice", "of", "a", "teapot"], "entities": [{"start": 3, "end": 4, "entity": "YEAR", "text": ["1991"]}, {"start": 5, "end": 6, "entity": "GENRE", "text": ["animated"]}, {"start": 9, "end": 11, "entity": "ACTOR", "text": ["angela", "lansbury"]}, {"start": 16, "end": 17, "entity": "CHARACTER_NAME", "text": ["teapot"]}]}
  • Japanese
{"text": ["I", "n", "f", "o", "r", "m", "i", "x", "ใฎ", "ๅ‹•", "ใ", "ใ‚’", "ใฟ", "ใฆ", "ใ€", "ใ‚ช", "ใƒฉ", "ใ‚ฏ", "ใƒซ", "ใจ", "I", "B", "M", "ใ‚‚", "่ฟฝ", "้š", "ใ—", "ใŸ", "ใ€‚"], "entities": [{"start": 0, "end": 8, "entity": "ๆณ•ไบบๅ", "text": ["I", "n", "f", "o", "r", "m", "i", "x"]}, {"start": 15, "end": 19, "entity": "ๆณ•ไบบๅ", "text": ["ใ‚ช", "ใƒฉ", "ใ‚ฏ", "ใƒซ"]}, {"start": 20, "end": 23, "entity": "ๆณ•ไบบๅ", "text": ["I", "B", "M"]}]}
{"text": ["็พ", "ๅœจ", "ใฏ", "ใ‚ข", "ใƒ‹", "ใƒก", "ใƒผ", "ใ‚ท", "ใƒง", "ใƒณ", "ๆฅญ", "็•Œ", "ใ‹", "ใ‚‰", "้€€", "ใ„", "ใฆ", "ใŠ", "ใ‚Š", "ใ€", "ๆฐด", "ๅฝฉ", "็”ป", "ๅฎถ", "ใจ", "ใ—", "ใฆ", "ใ‚‚", "ๆดป", "ๅ‹•", "ใ—", "ใฆ", "ใ„", "ใ‚‹", "ใ€‚"], "entities": []}
{"text": ["ๅคง", "้‡Ž", "ๆฑ", "ใ‚ค", "ใƒณ", "ใ‚ฟ", "ใƒผ", "ใƒ", "ใ‚ง", "ใƒณ", "ใ‚ธ", "ใฏ", "ใ€", "ๅคง", "ๅˆ†", "็œŒ", "่ฑŠ", "ๅพŒ", "ๅคง", "้‡Ž", "ๅธ‚", "ๅคง", "้‡Ž", "็”บ", "ๅพŒ", "็”ฐ", "ใซ", "ใ‚", "ใ‚‹", "ไธญ", "ไน", "ๅทž", "ๆจช", "ๆ–ญ", "้“", "่ทฏ", "ใฎ", "ใ‚ค", "ใƒณ", "ใ‚ฟ", "ใƒผ", "ใƒ", "ใ‚ง", "ใƒณ", "ใ‚ธ", "ใง", "ใ‚", "ใ‚‹", "ใ€‚"], "entities": [{"start": 0, "end": 11, "entity": "ๆ–ฝ่จญๅ", "text": ["ๅคง", "้‡Ž", "ๆฑ", "ใ‚ค", "ใƒณ", "ใ‚ฟ", "ใƒผ", "ใƒ", "ใ‚ง", "ใƒณ", "ใ‚ธ"]}, {"start": 13, "end": 26, "entity": "ๅœฐๅ", "text": ["ๅคง", "ๅˆ†", "็œŒ", "่ฑŠ", "ๅพŒ", "ๅคง", "้‡Ž", "ๅธ‚", "ๅคง", "้‡Ž", "็”บ", "ๅพŒ", "็”ฐ"]}, {"start": 29, "end": 36, "entity": "ๆ–ฝ่จญๅ", "text": ["ไธญ", "ไน", "ๅทž", "ๆจช", "ๆ–ญ", "้“", "่ทฏ"]}]}
{"text": ["2", "0", "1", "4", "ๅนด", "1", "ๆœˆ", "1", "5", "ๆ—ฅ", "ใ€", "ใƒž", "ใƒ", "ใ‚ฟ", "ใฏ", "ใƒŸ", "ใƒฃ", "ใƒณ", "ใƒž", "ใƒผ", "ใฎ", "ไธŠ", "ๅบง", "้ƒจ", "ไป", "ๆ•™", "ใ‚’", "ๆ“", "่ญท", "ใ™", "ใ‚‹", "ไฝฟ", "ๅ‘ฝ", "ใ‚’", "ๆŒ", "ใฃ", "ใฆ", "ใ€", "ใƒž", "ใƒณ", "ใƒ€", "ใƒฌ", "ใƒผ", "ใฎ", "ไป", "ๆ•™", "ๅƒง", "ใฎ", "ๅคง", "่ฆ", "ๆจก", "ใช", "ไผš", "่ญฐ", "ใง", "ๆญฃ", "ๅผ", "ใซ", "่จญ", "็ซ‹", "ใ•", "ใ‚Œ", "ใŸ", "ใ€‚"], "entities": [{"start": 11, "end": 14, "entity": "ๆณ•ไบบๅ", "text": ["ใƒž", "ใƒ", "ใ‚ฟ"]}, {"start": 15, "end": 20, "entity": "ๅœฐๅ", "text": ["ใƒŸ", "ใƒฃ", "ใƒณ", "ใƒž", "ใƒผ"]}, {"start": 38, "end": 43, "entity": "ๅœฐๅ", "text": ["ใƒž", "ใƒณ", "ใƒ€", "ใƒฌ", "ใƒผ"]}]}
{"text": ["ๆฐธ", "ๆณฐ", "่˜", "้ง…", "ใฏ", "ใ€", "ไธญ", "่ฏ", "ไบบ", "ๆฐ‘", "ๅ…ฑ", "ๅ’Œ", "ๅ›ฝ", "ๅŒ—", "ไบฌ", "ๅธ‚", "ๆตท", "ๆท€", "ๅŒบ", "ใซ", "ไฝ", "็ฝฎ", "ใ™", "ใ‚‹", "ๅŒ—", "ไบฌ", "ๅœฐ", "ไธ‹", "้‰„", "8", "ๅท", "็ทš", "ใฎ", "้ง…", "ใง", "ใ‚", "ใ‚‹", "ใ€‚"], "entities": [{"start": 0, "end": 4, "entity": "ๆ–ฝ่จญๅ", "text": ["ๆฐธ", "ๆณฐ", "่˜", "้ง…"]}, {"start": 6, "end": 19, "entity": "ๅœฐๅ", "text": ["ไธญ", "่ฏ", "ไบบ", "ๆฐ‘", "ๅ…ฑ", "ๅ’Œ", "ๅ›ฝ", "ๅŒ—", "ไบฌ", "ๅธ‚", "ๆตท", "ๆท€", "ๅŒบ"]}]}
  • Korean
{"text": ["๊ทธ", "๋ชจ์Šต", "์„", "๋ณด", "ใ„ด", "๋ฏผ์ด", "๋Š”", "ํ• ์•„๋ฒ„์ง€", "๊ฐ€", "๋งˆ์น˜", "์ „์Ÿํ„ฐ", "์—์„œ", "์ด๊ธฐ", "๊ณ ", "๋Œ์•„์˜ค", "ใ„ด", "์žฅ๊ตฐ", "์ฒ˜๋Ÿผ", "์˜์ “", "ํ•˜", "์•„", "๋ณด์ด", "ใ„ด๋‹ค๊ณ ", "์ƒ๊ฐ", "ํ•˜", "์•˜", "์Šต๋‹ˆ๋‹ค", "."], "entities": [{"start": 5, "end": 6, "entity": "PS", "text": ["๋ฏผ์ด"]}]}
{"text": ["๋‚ด๋‹ฌ", "18", "์ผ", "๋ถ€ํ„ฐ", "๋‚ด๋…„", "2", "์›”", "20", "์ผ", "๊นŒ์ง€", "๋Š”", "์„œ์šธ์—ญ", "์—์„œ", "๋ฌด์ฃผ๋ฆฌ์กฐํŠธ", "๋ถ€๊ทผ", "๊นŒ์ง€", "์Šคํ‚ค๊ด€๊ด‘", "์—ด์ฐจ", "๋ฅผ", "์šดํ–‰", "ํ•˜", "ใ„ด๋‹ค", "."], "entities": [{"start": 0, "end": 10, "entity": "DT", "text": ["๋‚ด๋‹ฌ", "18", "์ผ", "๋ถ€ํ„ฐ", "๋‚ด๋…„", "2", "์›”", "20", "์ผ", "๊นŒ์ง€"]}, {"start": 11, "end": 12, "entity": "LC", "text": ["์„œ์šธ์—ญ"]}, {"start": 13, "end": 14, "entity": "OG", "text": ["๋ฌด์ฃผ๋ฆฌ์กฐํŠธ"]}]}
{"text": ["ํ˜ธ์†Œ๋ ฅ", "์žˆ", "๊ณ ", "์„ ๋™", "์ ", "์ด", "ใ„ด", "์ฃผ์ œ", "๋ฅผ", "์žก์•„๋‚ด", "๋Š”", "๋ฐ", "๋Šฅํ•˜", "ใ„ด", "์ฆˆ์œ…", "์ด", "์ง€๋งŒ", "์ด", "์˜ํ™”", "์—์„œ", "๋Š”", "๋ฌด์—‡", "์ด", "ํ˜ธ์†Œ๋ ฅ", "์ด", "์žˆ", "์„์ง€", "๊ฒฐ์ •", "ํ•˜", "์ง€", "๋ชปํ•˜", "๊ณ ", "๋ง์„ค์ด", "ใ„ด๋‹ค", "."], "entities": [{"start": 14, "end": 15, "entity": "PS", "text": ["์ฆˆ์œ…"]}]}
{"text": ["๊ทธ๋ž˜์„œ", "์„ธํ˜ธ", "๋Š”", "๋ฐค", "์ด", "๋ฉด", "์นœ๊ตฌ", "๋„ค", "์ง‘", "์„", "๋Œ์•„๋‹ค๋‹ˆ", "๋ฉฐ", "์•„๋ฒ„์ง€", "๋ชฐ๋ž˜", "์—ฐ์Šต", "์„", "ํ•˜", "์•˜", "์Šต๋‹ˆ๋‹ค", "."], "entities": [{"start": 1, "end": 2, "entity": "PS", "text": ["์„ธํ˜ธ"]}, {"start": 3, "end": 4, "entity": "TI", "text": ["๋ฐค"]}]}
{"text": ["ํ™ฉ์”จ", "๋Š”", "์ž์‹ ", "์ด", "์–ด๋ฆฌ", "์–ด์„œ", "๋“ฃ", "์€", "์ด", "์ด์•ผ๊ธฐ", "๊ฐ€", "์–ด๋ฆฐ์ด", "๋“ค", "์—๊ฒŒ", "์†Œ๋ฐ•", "ํ•˜", "ใ„ด", "ํšจ์ž", "์˜", "๋งˆ์Œ", "์„", "์ „ํ•˜", "์•„", "์ฃผ", "ใ„น", "์ˆ˜", "์žˆ", "์„", "๊ฒƒ", "๊ฐ™", "์•„", "5", "๋ถ„", "์งœ๋ฆฌ", "๊ตฌ์—ฐ๋™ํ™”", "๋กœ", "๊ฐ์ƒ‰", "ํ•˜", "์•˜", "๋‹ค๊ณ ", "๋ง", "ํ•˜", "ใ„ด๋‹ค", "."], "entities": [{"start": 0, "end": 1, "entity": "PS", "text": ["ํ™ฉ์”จ"]}, {"start": 31, "end": 33, "entity": "TI", "text": ["5", "๋ถ„"]}]}
{"text": ["์•„๋ฒ„์ง€", "๊ฐ€", "๋Œ์•„๊ฐ€", "์‹œ", "ใ„ด", "๋’ค", "์–ด๋จธ๋‹ˆ", "์˜", "ํŽธ์• ", "๋ฅผ", "๋ฐฐ๊ฒฝ", "์œผ๋กœ", "์Šน์ฃผ", "๋Š”", "์ง‘์•ˆ", "์—์„œ", "๋งŒ", "์€", "๋Œ€๋‹จ", "ํ•˜", "ใ„ด", "๊ถŒ์„ธ", "๋ฅผ", "๋ˆ„๋ฆฌ", "์—ˆ", "๋‹ค", "."], "entities": [{"start": 12, "end": 13, "entity": "PS", "text": ["์Šน์ฃผ"]}]}

Labels

  • .txt
O
GPE.NAM
GPE.NOM
LOC.NAM
LOC.NOM
ORG.NAM
ORG.NOM
PER.NAM
PER.NOM
  • .json / .jsonl
{
    "O": {
        "idx": 0,
        "count": -1,
        "is_target": true
    },
    "GPE.NAM": {
        "idx": 1,
        "count": -1,
        "is_target": true
    },
    "GPE.NOM": {
        "idx": 2,
        "count": -1,
        "is_target": true
    },
    "LOC.NAM": {
        "idx": 3,
        "count": -1,
        "is_target": true
    },
    "LOC.NOM": {
        "idx": 4,
        "count": -1,
        "is_target": true
    },
    "ORG.NAM": {
        "idx": 5,
        "count": -1,
        "is_target": true
    },
    "ORG.NOM": {
        "idx": 6,
        "count": -1,
        "is_target": true
    },
    "PER.NAM": {
        "idx": 7,
        "count": -1,
        "is_target": true
    },
    "PER.NOM": {
        "idx": 8,
        "count": -1,
        "is_target": true
    },
    "ADJECTIVE": {
        "idx": 9,
        "count": 1008,
        "is_target": false
    },
    "ADPOSITION": {
        "idx": 10,
        "count": 41,
        "is_target": false
    },
    "ADVERB": {
        "idx": 11,
        "count": 1147,
        "is_target": false
    },
    "APP": {
        "idx": 12,
        "count": 3,
        "is_target": false
    },
    "AUXILIARY": {
        "idx": 13,
        "count": 4,
        "is_target": false
    },...
}
  • Model: BERT / RoBERTa
from main.trainers.knowfree_trainer import Trainer
from transformers import BertTokenizer, BertConfig

MODEL_PATH = "<MODEL_PATH>"
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
config = BertConfig.from_pretrained(MODEL_PATH)
trainer = Trainer(tokenizer=tokenizer, config=config, from_pretrained=MODEL_PATH,
                  data_name='<DATASET_NAME>',
                  batch_size=4,
                  batch_size_eval=8,
                  task_name='<TASK_NAME>')

for i in trainer(num_epochs=120, other_lr=1e-3, weight_decay=0.01, remove_clashed=True, nested=False, eval_call_step=lambda x: x % 125 == 0):
    a = i

Key Params

  • other_lr: the learning rate of the non-PLM part.
  • remove_clashed: remove the label that exists overlap (only choose the label with min start position)
  • nested: whether support nested entities, when do sequence labeling like CMeEE, you should set it as true and disabled remove_clashed.
  • eval_call_step: determine evaluation with x steps, defined with a function call.

Evaluation Only

Comment out the training loop to evaluate directly:

trainer.eval(0, is_eval=True)

Train with CNN Nested NER

from main.trainers.cnnner_trainer import Trainer
from transformers import BertTokenizer, BertConfig

MODEL_PATH = "<MODEL_PATH>"
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
config = BertConfig.from_pretrained(MODEL_PATH)
trainer = Trainer(tokenizer=tokenizer, config=config, from_pretrained=MODEL_PATH,
                  data_name='<DATASET_NAME>',
                  batch_size=4,
                  batch_size_eval=8,
                  task_name='<TASK_NAME>')

for i in trainer(num_epochs=120, other_lr=1e-3, weight_decay=0.01, remove_clashed=True, nested=False, eval_call_step=lambda x: x % 125 == 0):
    a = i

Prediction

from main.predictor.knowfree_predictor import KnowFREEPredictor
from transformers import BertTokenizer, BertConfig

MODEL_PATH = "<MODEL_PATH>"
LABEL_FILE = '<LABEL_PATH>'
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
config = BertConfig.from_pretrained(MODEL_PATH)
pred = KnowFREEPredictor(tokenizer=tokenizer, config=config, from_pretrained=MODEL_PATH, label_file=LABEL_FILE, batch_size=4)

for entities in pred(['ๅถ่ตŸ่‘†๏ผšๅ…จ็ƒๆ—ถๅฐš่ดข่ฟๆปšๆปš่€Œๆฅ้’ฑ', 'ๆˆ‘่ฆๅŽปๆˆ‘่ฆๅŽป่Šฑๅฟƒ่Šฑๅฟƒ่Šฑๅฟƒ่€ถๅˆ†ๆ‰‹ๅคงๅธˆ่ดตไป”้‚“่ถ…ๅ››ๅคงๅๆ•ๅ›ด่ง‚่ฏ็ญ’่ฝฌๅ‘้‚“่ถ…่ดดๅงๅพฎๅšๅทๅค–่ฏ็ญ’ๆœ›ๅ‘จ็Ÿฅใ€‚้‚“่ถ…ๅ››ๅคงๅๆ•']):
    print(entities)

Result

[
    [
        {'start': 0, 'end': 3, 'entity': 'PER.NAM', 'text': ['ๅถ', '่ตŸ', '่‘†'
            ]
        }
    ],
    [
        {'start': 45, 'end': 47, 'entity': 'PER.NAM', 'text': ['้‚“', '่ถ…'
            ]
        },
        {'start': 19, 'end': 21, 'entity': 'PER.NAM', 'text': ['้‚“', '่ถ…'
            ]
        },
        {'start': 31, 'end': 33, 'entity': 'PER.NAM', 'text': ['้‚“', '่ถ…'
            ]
        }
    ]
]

๐Ÿ“š Citation

@misc{lai2025improvinglowresourcesequencelabeling,
      title={Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations}, 
      author={Peichao Lai and Jiaxin Gan and Feiyang Ye and Yilei Wang and Bin Cui},
      year={2025},
      eprint={2501.19093},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.19093}, 
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aleversn/KnowFREE-Weibo-BERT-base

Finetuned
(4)
this model

Paper for aleversn/KnowFREE-Weibo-BERT-base