rafmacalaba commited on
Commit
78f92d5
·
verified ·
1 Parent(s): a2bf14d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gliner2
4
+ - ner
5
+ - dataset-extraction
6
+ - lora
7
+ base_model: fastino/gliner2-base-v1
8
+ library_name: gliner2
9
+ ---
10
+
11
+ # GLiNER2 Dataset Mention Extractor
12
+
13
+ Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents.
14
+
15
+ ## Task
16
+ Given a document passage, extracts:
17
+ - **Entity fields**: dataset_name, acronym, producer, geography, description, etc.
18
+ - **Classifications**: dataset_tag (named/descriptive/vague), usage_context, is_used
19
+
20
+ ## Training
21
+ - **Base model**: `fastino/gliner2-base-v1`
22
+ - **Method**: LoRA (r=16, alpha=32)
23
+ - **Data**: 1,197 synthetic training examples
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ from gliner2 import GLiNER2
29
+
30
+ extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
31
+ extractor.load_adapter("rafmacalaba/gliner2-datause-v1")
32
+
33
+ schema = (
34
+ extractor.create_schema()
35
+ .structure("dataset_mention")
36
+ .field("dataset_name", dtype="str")
37
+ .field("acronym", dtype="str")
38
+ .field("producer", dtype="str")
39
+ .field("geography", dtype="str")
40
+ .field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"])
41
+ .field("usage_context", dtype="str", choices=["primary", "supporting", "background"])
42
+ .field("is_used", dtype="str", choices=["True", "False"])
43
+ )
44
+
45
+ results = extractor.extract(text, schema)
46
+ dataset_mentions = results["dataset_mention"]
47
+ ```