fine-tuned-model-led

Fine-tuned version of allenai/led-base-16384 for automatic metadata extraction from academic documents (books, theses, journal articles, conference papers) in Spanish, developed as part of a thesis project at SEDICI (Servicio de Difusión de la Creación Intelectual — UNLP).

What it does

Given the plain text of an academic document (PDF), the model extracts structured metadata fields such as title, authors, date, abstract, keywords, subject, document type, and more — returning a JSON object.

Base model

This model is a fine-tune of allenai/led-base-16384 (Longformer Encoder-Decoder), which supports sequences up to 16 384 tokens — suitable for full academic document texts.

Usage

This model is designed to run as part of the full extraction pipeline. See the project repository and documentation for setup instructions:

Training data

Fine-tuned on a curated dataset of academic documents from the SEDICI repository, with manually validated metadata used as ground truth.

Downloads last month
6
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nahpanigo99/fine-tuned-model-led

Finetuned
(44)
this model