--- library_name: transformers tags: - vision-language-model - image-decomposition --- # SynLayers This repository contains the assets behind SynLayers, our two-stage image decomposition system. At the root is the bbox-caption model. Given one image, it predicts: - a whole-image caption - bounding boxes for visible objects or layers The same repo also includes the Stage 2 SynLayers pipeline to do layer decomposition. If you want the easiest way to try the full system, please use our public demo: [SynLayers/synlayers](https://huggingface.co/spaces/SynLayers/synlayers) This repo is not meant to be used as a single generic `DiffusionPipeline(prompt)` model. The full SynLayers pipeline is: 1. bbox + whole-caption prediction 2. layer decomposition into transparent RGBA outputs If you only want the Stage 1 model at the repo root, you can load it with `transformers`. ```python from transformers import AutoProcessor, Qwen3VLForConditionalGeneration model = Qwen3VLForConditionalGeneration.from_pretrained( "SynLayers/Bbox-caption-8b", torch_dtype="auto", device_map="auto", ) processor = AutoProcessor.from_pretrained("SynLayers/Bbox-caption-8b") ``` Thanks for trying SynLayers.