File size: 1,358 Bytes
83f4b0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
tags:
  - gliner2
  - ner
  - dataset-extraction
  - lora
base_model: fastino/gliner2-base-v1
library_name: gliner2
---

# GLiNER2 Dataset Mention Extractor

Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents.

## Task
Given a document passage, extracts:
- **Entity fields**: dataset_name, acronym, producer, geography, description, etc.
- **Classifications**: dataset_tag (named/descriptive/vague), usage_context, is_used

## Training
- **Base model**: `fastino/gliner2-base-v1`
- **Method**: LoRA (r=16, alpha=32)
- **Data**: 1,197 synthetic training examples

## Usage

```python
from gliner2 import GLiNER2

extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-v2")

schema = (
    extractor.create_schema()
    .structure("dataset_mention")
        .field("dataset_name", dtype="str")
        .field("acronym", dtype="str")
        .field("producer", dtype="str")
        .field("geography", dtype="str")
        .field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"])
        .field("usage_context", dtype="str", choices=["primary", "supporting", "background"])
        .field("is_used", dtype="str", choices=["True", "False"])
)

results = extractor.extract(text, schema)
dataset_mentions = results["dataset_mention"]
```