Instructions to use docling-project/SmolDocling-256M-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use docling-project/SmolDocling-256M-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="docling-project/SmolDocling-256M-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("docling-project/SmolDocling-256M-preview") model = AutoModelForImageTextToText.from_pretrained("docling-project/SmolDocling-256M-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use docling-project/SmolDocling-256M-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "docling-project/SmolDocling-256M-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "docling-project/SmolDocling-256M-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/docling-project/SmolDocling-256M-preview
- SGLang
How to use docling-project/SmolDocling-256M-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "docling-project/SmolDocling-256M-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "docling-project/SmolDocling-256M-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "docling-project/SmolDocling-256M-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "docling-project/SmolDocling-256M-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use docling-project/SmolDocling-256M-preview with Docker Model Runner:
docker model run hf.co/docling-project/SmolDocling-256M-preview
Support multi-language
Does this model support other language than english?
And even if it doesn't, how can data be synthesized and the model retrained for other languages.
Since it's a popular demand, I will try to make a script that demonstrates the fine-tuning on a different language, so I will leave the issue till this is done. We will be working on multilingual support and evaluating it across different languages. But the easiest way to get started is to have documents already converted to DoclingDocuments, then exporting them to DocTags, then finetuning with them. Check "export_to_document_tokens" https://docling-project.github.io/docling/reference/docling_document/
@phonk2682 Thanks for your interest. To fine-tune properly I suggest you have your documents in DocTag format. To be able to do so you need to check https://docling-project.github.io/docling/reference/docling_document/, there you can import documents from html, MD etc, or build the DoclingDocument yourself, then export it to DocTags, then you will have the exact format we use for training.
Hi @asnassar ,
I've been reviewing the DocLing documentation at https://docling-project.github.io/docling/reference/docling_document but couldn't find information on converting HTML or Markdown files into DocTag format. The documentation appears to only cover converting from DocTag to Markdown/HTML.
Could you please provide guidance on how to perform this reverse conversion? Any examples or documentation would be greatly appreciated.
Thank you for your help!
Hello @hatimbr , you can find it over here:
https://docling-project.github.io/docling/examples/minimal/
Instead of exporting to markdown, just use result.document.export_to_document_tokens .
@hatimbr This is easy:
- first convert document into DoclingDocument
- export the DoclingDocument to DocTags (see this: https://github.com/docling-project/docling-core/blob/2371c11b8f74628169a9bb377036511235070af0/docling_core/types/doc/document.py#L3552)
PS: there are new serializers coming today: https://github.com/docling-project/docling-core/pull/192
=> if you can wait, I would suggest to use those
@asnassar @PeterWJStaar thank you for your answers. I understand them, but I still don't know how to transform a Markdown or HTML into a DoclingDocument (without doing it manually). I don't see any method for that. Maybe I should check the PR you pushed.
How can i fine tune this model to handle complex table structure or is there a way to extract it properly by handling it with PROMPTING
hi! any updates? :-)