Model Card for granite-final-finetuned
This model is a fine-tuned version of ibm-granite/granite-docling-258M. It has been trained using TRL.
It is trained on 2300 pages of historical spanish documents and validated on 265 pages, tested on 265 pages.
Find a guide on finetunning the model on Medium !
Quick start
from transformers import AutoModelForImageTextToText, AutoProcessor, BitsAndBytesConfig
from PIL import Image
import requests
import torch
from io import BytesIO
model_id = "Ak137/granite-docling-258M-finetuned"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map='cuda',
attn_implementation="flash_attention_2",
dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_id)
def generate_text_from_sample(model, processor, sample, max_new_tokens=40):
prompt = processor.apply_chat_template(sample["prompt"], add_generation_prompt=True)
inputs = processor(text=prompt, images=sample["images"], return_tensors="pt")
inputs = inputs.to(model.device)
outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
return (processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
user_prompt = "Convert this document to docling format"
url = "https://blogs.loc.gov/law/files/2020/01/sld-misc-image-1.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))
if img.mode != 'RGB':
img = img.convert('RGB')
example = {
"images": [img],
"prompt":
[
{ 'role': 'user',
'content':
[
{'type': 'image'},
{'text': user_prompt, 'type': 'text'}
],
}
],
}
res = generate_text_from_sample(model, processor, sample=example, max_new_tokens=120)
print(res)
Training procedure
This model was trained using SFT using an Nvidia H100 GPU for a total of 4 hours.
Framework versions
- TRL: 0.23.1
- Transformers: 4.56.2
- Pytorch: 2.8.0+cu129
- Datasets: 4.1.1
- Tokenizers: 0.22.1
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Ak137/granite-docling-258M-finetuned
Base model
ibm-granite/granite-docling-258M