Instructions to use microsoft/kosmos-2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/kosmos-2.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="microsoft/kosmos-2.5")# Load model directly from transformers import AutoImageProcessor, AutoModelForMultimodalLM processor = AutoImageProcessor.from_pretrained("microsoft/kosmos-2.5") model = AutoModelForMultimodalLM.from_pretrained("microsoft/kosmos-2.5") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/kosmos-2.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/kosmos-2.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/kosmos-2.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/kosmos-2.5
- SGLang
How to use microsoft/kosmos-2.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/kosmos-2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/kosmos-2.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/kosmos-2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/kosmos-2.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/kosmos-2.5 with Docker Model Runner:
docker model run hf.co/microsoft/kosmos-2.5
Kosmos-2.5 - Containerized & made available over an API
While Kosmos-2.5 is an incredibly useful model, and especially precious as an open-source MLLM that excels at OCR (not so much at markdown-generation in my testing!), it is also incredibly difficult to deploy & get working locally. It's even more difficult to deploy it in a useful way - one wherein it can be made available usefully to other applications & for development tasks.
This is due in large part to its many specific requirements, both hardware and software. One such example is its use of a custom version of the transformers library: Kosmos-2.5 requires a special "v4.32.0.dev0", whereas newer LLMs such as Google's Gemma2 require more recent version to work correctly. Another is the custom Fairseq lib that does not work outside of Python v3.10. Such requirements can hamper other applciation development & utilization tasks.
In terms of hardware requirements, the use of Flash Attention limits the userbase to specific generations of Nvidia GPUs only.
While not much can be done about the hardware requirements, I did see an opportunity to ease up the software challenges: by containerizing the model and its dependencies and leveraging PyFlask to expose the model over a RESTful API, Kosmos-2.5 can be made available as a service, thus providing fully local & high-performance OCR capabilities leveraging a cutting-edge MLLM!
I have open-sourced the prebuilt images, and detailed everything pertaining to the deployment and use of these images, along with all details on how to build these images from scratch and even deploy the model uncontainerized in my GitHub repo: https://github.com/abgulati/kosmos-2_5-containerized
Hope this helps the community deploy, port and use the model more easily!
generate the caption of providede image
