| --- |
| license: apache-2.0 |
| datasets: |
| - HuggingFaceFW/fineweb |
| - HuggingFaceFW/fineweb-2 |
| language: |
| - en |
| - sq |
| - ar |
| - cs |
| - fr |
| - de |
| - it |
| - ro |
| - es |
| - sl |
| - tr |
| - sr |
| base_model: |
| - microsoft/mdeberta-v3-base |
| - Alibaba-NLP/gte-multilingual-base |
| tags: |
| - relation-extraction |
| --- |
| # GLiDRE: Generalist and Lightweight Model for Document Relation Extraction |
| ## Overview |
| GLiDRE is a generalist and lightweight model designed for Document Relation Extraction. It enables users to extract and classify relationships among entities within unstructured text documents. Built upon the success of previous work [GLiNER](https://github.com/urchade/GLiNER). |
|
|
| ## Key Features |
| - **Zero-Shot Extraction:** Capable of classifying unseen relations directly from text. |
| - **Versatile Input Handling:** Compatible with both tokenized text and full documents. |
| - **Customizable Architecture:** Supports multiple loss functions and allows easy modification of model components. |
|
|
| ## Installation |
|
|
| Install [GLiDRE](https://github.com/cea-list-lasti/glidre) |
| ```bash |
| pip install . |
| ``` |
|
|
| ## Quick Start |
|
|
| Here's a simple Python example to get you started: |
| ```python |
| from glidre import GLiDRE |
| |
| model = GLiDRE.from_pretrained("cea-list-ia/glidre_large") |
| |
| text = "The Loud Tour was the fourth overall and third world concert tour by Barbadian recording artist Rihanna." |
| |
| # Define relation labels |
| labels = ["COUNTRY_OF_CITIZENSHIP", "PUBLICATION_DATE", "PART_OF"] # Labels are uppercase because the model performs better with capitalized relation names |
| # Define entity mentions (format: [{"id" : id, "type" : type, "mentions" : [{"value" : text, "start" : start_idx, "end" : end_idx}]}]) |
| mentions = [{ |
| "id": 0, |
| "mentions": [ |
| { |
| "value": "Barbadian", |
| "start": 69, |
| "end": 78 |
| } |
| ], |
| "type": "LOC" |
| }, |
| { |
| "id": 1, |
| "mentions": [ |
| { |
| "value": "Rihanna", |
| "start": 96, |
| "end": 103 |
| } |
| ], |
| "type": "PER"}] |
| |
| # Predict relations using GLiDRE |
| relations = model.predict_entities(text = text, labels = labels, mentions = mentions, threshold=0.3, multi_label = False) |
| print("Predicted Relations:") |
| for relation in relations: |
| print(relation["entity_1"]) |
| print("Label :", relation["relation_type"]) |
| print(relation["entity_2"]) |
| print("---") |
| ``` |
|
|
| ## Training |
|
|
| GLiDRE supports training on various datasets such as DocRED and Re-DocRED. |
| ```bash |
| # For Re-DocRED: |
| python3 train.py --config configs/config_finetuning.yaml |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{armingaud2025glidregeneralistlightweightmodel, |
| title={GLiDRE: Generalist Lightweight model for Document-level Relation Extraction}, |
| author={Robin Armingaud and Romaric Besançon}, |
| year={2025}, |
| eprint={2508.00757}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2508.00757}, |
| } |
| ``` |