fine-tuned-model-led
Fine-tuned version of allenai/led-base-16384 for automatic metadata extraction from academic documents (books, theses, journal articles, conference papers) in Spanish, developed as part of a thesis project at SEDICI (Servicio de Difusión de la Creación Intelectual — UNLP).
What it does
Given the plain text of an academic document (PDF), the model extracts structured metadata fields such as title, authors, date, abstract, keywords, subject, document type, and more — returning a JSON object.
Base model
This model is a fine-tune of allenai/led-base-16384 (Longformer Encoder-Decoder), which supports sequences up to 16 384 tokens — suitable for full academic document texts.
Usage
This model is designed to run as part of the full extraction pipeline. See the project repository and documentation for setup instructions:
- Code & full pipeline: https://github.com/nahuelPanigo/document_extraction_llm
- Documentation: https://nahuelpanigo.github.io/document_extraction_llm/
Training data
Fine-tuned on a curated dataset of academic documents from the SEDICI repository, with manually validated metadata used as ground truth.
- Downloads last month
- 6
Model tree for Nahpanigo99/fine-tuned-model-led
Base model
allenai/led-base-16384