--- tags: - gliner2 - ner - dataset-extraction - lora base_model: fastino/gliner2-base-v1 library_name: gliner2 --- # GLiNER2 Dataset Mention Extractor Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents. ## Task Given a document passage, extracts: - **Entity fields**: dataset_name, acronym, producer, geography, description, etc. - **Classifications**: dataset_tag (named/descriptive/vague), usage_context, is_used ## Training - **Base model**: `fastino/gliner2-base-v1` - **Method**: LoRA (r=16, alpha=32) - **Data**: 1,197 synthetic training examples ## Usage ```python from gliner2 import GLiNER2 extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1") extractor.load_adapter("rafmacalaba/gliner2-datause-v2") schema = ( extractor.create_schema() .structure("dataset_mention") .field("dataset_name", dtype="str") .field("acronym", dtype="str") .field("producer", dtype="str") .field("geography", dtype="str") .field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"]) .field("usage_context", dtype="str", choices=["primary", "supporting", "background"]) .field("is_used", dtype="str", choices=["True", "False"]) ) results = extractor.extract(text, schema) dataset_mentions = results["dataset_mention"] ```