| | --- |
| | tags: |
| | - gliner2 |
| | - ner |
| | - dataset-extraction |
| | - lora |
| | base_model: fastino/gliner2-base-v1 |
| | library_name: gliner2 |
| | --- |
| | |
| | # GLiNER2 Dataset Mention Extractor |
| |
|
| | Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents. |
| |
|
| | ## Task |
| | Given a document passage, extracts: |
| | - **Entity fields**: dataset_name, acronym, producer, geography, description, etc. |
| | - **Classifications**: dataset_tag (named/descriptive/vague), usage_context, is_used |
| |
|
| | ## Training |
| | - **Base model**: `fastino/gliner2-base-v1` |
| | - **Method**: LoRA (r=16, alpha=32) |
| | - **Data**: 1,197 synthetic training examples |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from gliner2 import GLiNER2 |
| | |
| | extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1") |
| | extractor.load_adapter("rafmacalaba/gliner2-datause-v1") |
| | |
| | schema = ( |
| | extractor.create_schema() |
| | .structure("dataset_mention") |
| | .field("dataset_name", dtype="str") |
| | .field("acronym", dtype="str") |
| | .field("producer", dtype="str") |
| | .field("geography", dtype="str") |
| | .field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"]) |
| | .field("usage_context", dtype="str", choices=["primary", "supporting", "background"]) |
| | .field("is_used", dtype="str", choices=["True", "False"]) |
| | ) |
| | |
| | results = extractor.extract(text, schema) |
| | dataset_mentions = results["dataset_mention"] |
| | ``` |
| |
|