Other benchmarks

by nickarafyllis - opened Apr 21

Discussion

nickarafyllis

Apr 21

•

edited Apr 21

Hey Luca!

great work on this finetuning

I was wondering whether this model has been evaluated against other similar datasets for NER, in order to understand how it performs in docs with extended or similar but different vocabulary.

did you find any other open dataset in this scope?

Also, a bit of-topic, but since you are into legal NER, do you also have any leads for legal RE (Relation Extraction) between these entities as well?

Lastly, what is your strategy to fit the ContractNer chunks into the model? overlapping chunks or sth else?

Why didn't you finetune to the latest version of glinner (https://huggingface.co/fastino/gliner2-multi-v1) and you did it to an older one?

These research insights would be of great help to me.

Thanks!!

lucasorrentino

Agilelab org 3 days ago

Hi! Thanks for the positive feedback, and sorry for the delayed response.

Here are the insights and details regarding your questions:

Evaluation & Open Datasets
The model was fine-tuned and evaluated exclusively on this specific dataset. While we have experimented with other fine-tuning runs internally, those were focused on entirely different domains.
Legal Relation Extraction (RE)
For Relation Extraction in the legal domain, I would suggest to experiment with the GLiNER 2.0 architecture. I see you are already familiar with it!
Chunking Strategy & Context Fitting
I can provide more concrete details on this. For a comprehensive breakdown, I highly recommend checking out our published article on the topic: Efficient NER in the Age of LLMs.
To fit the ContractNer chunks into the model, we implemented a sliding window strategy tailored to the BERT context length (512 tokens in our setup):

Overlapping Windows: The window slides with an overlap of N words, where N is dynamically set to the average span length of our target classes.
Coordinate Reconstruction: For each window pass, we track the relative text coordinates within that specific chunk. Once processing is complete, we map them back to reconstruct the absolute coordinates for the entire document. (NB: you can process different chunks in batch and the have same size and same labels)
Boundary Resolution: To handle overlapping entity predictions at the boundaries of adjacent windows, we apply a simple but effective heuristic: we prioritize the longest predicted span.

GLiNER Model Selection
The explanation here is quite straightforward: this project was developed and delivered in 2025, which predates the release of the GLiNER 2.0 (gliner2-multi-v1) architecture. At the time of development, we utilized the best-performing stable version available.

I hope these insights help with your research! Let me know if you want to dive deeper into any of these points.

Best,
Luca

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment