--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb - HuggingFaceFW/fineweb-2 language: - en - sq - ar - cs - fr - de - it - ro - es - sl - tr - sr base_model: - microsoft/mdeberta-v3-base - Alibaba-NLP/gte-multilingual-base tags: - relation-extraction --- # GLiDRE: Generalist and Lightweight Model for Document Relation Extraction ## Overview GLiDRE is a generalist and lightweight model designed for Document Relation Extraction. It enables users to extract and classify relationships among entities within unstructured text documents. Built upon the success of previous work [GLiNER](https://github.com/urchade/GLiNER). ## Key Features - **Zero-Shot Extraction:** Capable of classifying unseen relations directly from text. - **Versatile Input Handling:** Compatible with both tokenized text and full documents. - **Customizable Architecture:** Supports multiple loss functions and allows easy modification of model components. ## Installation Install [GLiDRE](https://github.com/cea-list-lasti/glidre) ```bash pip install . ``` ## Quick Start Here's a simple Python example to get you started: ```python from glidre import GLiDRE model = GLiDRE.from_pretrained("cea-list-ia/glidre_large") text = "The Loud Tour was the fourth overall and third world concert tour by Barbadian recording artist Rihanna." # Define relation labels labels = ["COUNTRY_OF_CITIZENSHIP", "PUBLICATION_DATE", "PART_OF"] # Labels are uppercase because the model performs better with capitalized relation names # Define entity mentions (format: [{"id" : id, "type" : type, "mentions" : [{"value" : text, "start" : start_idx, "end" : end_idx}]}]) mentions = [{ "id": 0, "mentions": [ { "value": "Barbadian", "start": 69, "end": 78 } ], "type": "LOC" }, { "id": 1, "mentions": [ { "value": "Rihanna", "start": 96, "end": 103 } ], "type": "PER"}] # Predict relations using GLiDRE relations = model.predict_entities(text = text, labels = labels, mentions = mentions, threshold=0.3, multi_label = False) print("Predicted Relations:") for relation in relations: print(relation["entity_1"]) print("Label :", relation["relation_type"]) print(relation["entity_2"]) print("---") ``` ## Training GLiDRE supports training on various datasets such as DocRED and Re-DocRED. ```bash # For Re-DocRED: python3 train.py --config configs/config_finetuning.yaml ``` ## Citation ```bibtex @misc{armingaud2025glidregeneralistlightweightmodel, title={GLiDRE: Generalist Lightweight model for Document-level Relation Extraction}, author={Robin Armingaud and Romaric Besançon}, year={2025}, eprint={2508.00757}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.00757}, } ```