Duplicate from cea-list-ia/glidre_multi

b280ec3 2 days ago

3.11 kB

license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb
  - HuggingFaceFW/fineweb-2
language:
  - en
  - sq
  - ar
  - cs
  - fr
  - de
  - it
  - ro
  - es
  - sl
  - tr
  - sr
base_model:
  - microsoft/mdeberta-v3-base
  - Alibaba-NLP/gte-multilingual-base
tags:
  - relation-extraction

GLiDRE: Generalist and Lightweight Model for Document Relation Extraction

Overview

GLiDRE is a generalist and lightweight model designed for Document Relation Extraction. It enables users to extract and classify relationships among entities within unstructured text documents. Built upon the success of previous work GLiNER.

Key Features

Zero-Shot Extraction: Capable of classifying unseen relations directly from text.
Versatile Input Handling: Compatible with both tokenized text and full documents.
Customizable Architecture: Supports multiple loss functions and allows easy modification of model components.

Installation

Install GLiDRE

pip install .

Quick Start

Here's a simple Python example to get you started:

from glidre import GLiDRE

model = GLiDRE.from_pretrained("cea-list-ia/glidre_large")

text = "The Loud Tour was the fourth overall and third world concert tour by Barbadian recording artist Rihanna."

# Define relation labels
labels = ["COUNTRY_OF_CITIZENSHIP", "PUBLICATION_DATE", "PART_OF"] # Labels are uppercase because the model performs better with capitalized relation names
# Define entity mentions (format: [{"id" : id, "type" : type, "mentions" : [{"value" : text, "start" : start_idx, "end" : end_idx}]}])
mentions = [{
                "id": 0,
                "mentions": [
                    {
                        "value": "Barbadian",
                        "start": 69,
                        "end": 78
                    }
                ],
                "type": "LOC"
            },
            {
                "id": 1,
                "mentions": [
                    {
                        "value": "Rihanna",
                        "start": 96,
                        "end": 103
                    }
                ],
                "type": "PER"}]

# Predict relations using GLiDRE
relations = model.predict_entities(text = text, labels = labels, mentions = mentions, threshold=0.3, multi_label = False)
print("Predicted Relations:")
for relation in relations:
    print(relation["entity_1"])
    print("Label :",  relation["relation_type"])
    print(relation["entity_2"])
    print("---")

Training

GLiDRE supports training on various datasets such as DocRED and Re-DocRED.

# For Re-DocRED:
python3 train.py --config configs/config_finetuning.yaml

Citation

@misc{armingaud2025glidregeneralistlightweightmodel,
      title={GLiDRE: Generalist Lightweight model for Document-level Relation Extraction}, 
      author={Robin Armingaud and Romaric Besançon},
      year={2025},
      eprint={2508.00757},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.00757}, 
}