Image-Text-to-Text
Transformers
Safetensors
qwen3_vl
vision-language-model
image-to-text
bounding-box-detection
synlayers
conversational
Instructions to use SynLayers/Bbox-caption-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SynLayers/Bbox-caption-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="SynLayers/Bbox-caption-8b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("SynLayers/Bbox-caption-8b") model = AutoModelForImageTextToText.from_pretrained("SynLayers/Bbox-caption-8b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SynLayers/Bbox-caption-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SynLayers/Bbox-caption-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/SynLayers/Bbox-caption-8b
- SGLang
How to use SynLayers/Bbox-caption-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SynLayers/Bbox-caption-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SynLayers/Bbox-caption-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use SynLayers/Bbox-caption-8b with Docker Model Runner:
docker model run hf.co/SynLayers/Bbox-caption-8b
File size: 2,164 Bytes
e0543ce 30d5421 e0543ce cdbf51d 30d5421 e0543ce 30d5421 e0543ce 30d5421 e0543ce 30d5421 e0543ce 30d5421 e0543ce 30d5421 e0543ce 30d5421 e0543ce 30d5421 cdbf51d 30d5421 e0543ce 30d5421 cdbf51d bf9b139 cdbf51d bf9b139 cdbf51d bf9b139 e0543ce | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ---
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- vision-language-model
- image-to-text
- bounding-box-detection
- synlayers
---
# SynLayers Bbox-Caption Model
This repository hosts the **Stage 1 bbox-caption model** for SynLayers.
Given an input image, the model predicts:
- a whole-image caption
- bounding boxes for visible objects or layers
This repository is only for the Stage 1 detector. The full SynLayers system has two stages:
1. bbox + whole-caption prediction from this repo
2. layer decomposition into transparent RGBA outputs using the Stage 2 checkpoints
For the complete demo, please use our public Space:
[SynLayers/synlayers](https://huggingface.co/spaces/SynLayers/synlayers)
For the Stage 2 decomposition checkpoints and runtime assets, please see:
[SynLayers/synlayers](https://huggingface.co/SynLayers/synlayers)
This repo is not intended to be loaded as a generic `DiffusionPipeline(prompt)` model. If you only want the Stage 1 model, you can load it with `transformers`:
```python
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model = Qwen3VLForConditionalGeneration.from_pretrained(
"SynLayers/Bbox-caption-8b",
torch_dtype="auto",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("SynLayers/Bbox-caption-8b")
```
This repository also includes lightweight inference helpers under `demo/infer/`. To run whole-caption and bbox inference on a folder of images:
```bash
python demo/infer/run_caption_bbox_infer.py \
--model SynLayers/Bbox-caption-8b \
--data-dir /path/to/images \
--output outputs/caption_bbox_infer.jsonl \
--vis-dir outputs/bbox_vis
```
For more details, please check our paper:
[https://arxiv.org/abs/2605.15167](https://arxiv.org/abs/2605.15167)
If you find our work useful, please consider citing:
```bibtex
@article{wu2026does,
title={Does Synthetic Layered Design Data Benefit Layered Design Decomposition?},
author={Wu, Kam Man and Yang, Haolin and Chen, Qingyu and Tang, Yihu and Chen, Jingye and Chen, Qifeng},
journal={arXiv preprint arXiv:2605.15167},
year={2026}
}
```
Thanks for trying SynLayers.
|