File size: 1,358 Bytes
83f4b0d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | ---
tags:
- gliner2
- ner
- dataset-extraction
- lora
base_model: fastino/gliner2-base-v1
library_name: gliner2
---
# GLiNER2 Dataset Mention Extractor
Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents.
## Task
Given a document passage, extracts:
- **Entity fields**: dataset_name, acronym, producer, geography, description, etc.
- **Classifications**: dataset_tag (named/descriptive/vague), usage_context, is_used
## Training
- **Base model**: `fastino/gliner2-base-v1`
- **Method**: LoRA (r=16, alpha=32)
- **Data**: 1,197 synthetic training examples
## Usage
```python
from gliner2 import GLiNER2
extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-v2")
schema = (
extractor.create_schema()
.structure("dataset_mention")
.field("dataset_name", dtype="str")
.field("acronym", dtype="str")
.field("producer", dtype="str")
.field("geography", dtype="str")
.field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"])
.field("usage_context", dtype="str", choices=["primary", "supporting", "background"])
.field("is_used", dtype="str", choices=["True", "False"])
)
results = extractor.extract(text, schema)
dataset_mentions = results["dataset_mention"]
```
|