Model Card for granite-final-finetuned

This model is a fine-tuned version of ibm-granite/granite-docling-258M. It has been trained using TRL.

It is trained on 2300 pages of historical spanish documents and validated on 265 pages, tested on 265 pages.

Find a guide on finetunning the model on Medium !

Quick start

from transformers import AutoModelForImageTextToText, AutoProcessor, BitsAndBytesConfig
from PIL import Image
import requests
import torch
from io import BytesIO


model_id = "Ak137/granite-docling-258M-finetuned" 

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map='cuda',
    attn_implementation="flash_attention_2",
    dtype=torch.bfloat16 
)
processor = AutoProcessor.from_pretrained(model_id)


def generate_text_from_sample(model, processor, sample, max_new_tokens=40):
    prompt = processor.apply_chat_template(sample["prompt"], add_generation_prompt=True)
    inputs = processor(text=prompt, images=sample["images"], return_tensors="pt")
    inputs = inputs.to(model.device)
    
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return (processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))


user_prompt = "Convert this document to docling format"

url = "https://blogs.loc.gov/law/files/2020/01/sld-misc-image-1.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))
if img.mode != 'RGB':
    img = img.convert('RGB')
    
example = {
            "images": [img],
            "prompt": 
                [
                        {   'role': 'user',
                            'content': 
                                [
                                    {'type': 'image'},
                                    {'text': user_prompt, 'type': 'text'}
                                ],
                         }
                ],
        }


res = generate_text_from_sample(model, processor, sample=example, max_new_tokens=120)
print(res)

Training procedure

This model was trained using SFT using an Nvidia H100 GPU for a total of 4 hours.

Framework versions

TRL: 0.23.1
Transformers: 4.56.2
Pytorch: 2.8.0+cu129
Datasets: 4.1.1
Tokenizers: 0.22.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ak137/granite-docling-258M-finetuned

Base model

ibm-granite/granite-docling-258M

Finetuned

(7)

this model