diffsketcher / README.md

jree423

Update: Add model card for original implementation

65d1d19 verified 11 months ago

1.99 kB

pipeline_tag: text-to-image
tags:
  - text-to-image
  - svg
  - vector-graphics
license: mit

Diffsketcher - Vector Graphics Generation

This model generates vector graphics (SVG) from text prompts. It uses the original implementation from the official repository.

Model Description

DiffSketcher generates vector graphics (SVG) from text prompts. It uses a diffusion model to guide the SVG generation and creates sketches with a specified number of paths.

Usage

import requests

API_URL = "https://api-inference.huggingface.co/models/jree423/diffsketcher"
headers = {"Authorization": "Bearer YOUR_API_TOKEN"}

def query(prompt):
    response = requests.post(API_URL, headers=headers, json={"inputs": prompt})
    return response.content  # Returns the image directly

# Generate an image
with open("output.png", "wb") as f:
    f.write(query("a beautiful mountain landscape"))

Examples

"a beautiful mountain landscape"
"a red sports car"
"a portrait of a woman"
"a cat playing with a ball"

How It Works

Text Encoding: The text prompt is encoded using CLIP.
Diffusion Process: A diffusion model generates a latent representation.
SVG Generation: The latent representation is used to generate an SVG.
PNG Conversion: The SVG is converted to PNG for display.

Performance Considerations

The original implementation requires significant computational resources
Generation can take several minutes depending on the complexity
GPU acceleration is recommended for optimal performance

Citation

@article{xing2023diffsketcher,
  title={{DiffSketcher}: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
  author={Xing, XiMing and Zhan, Chuang and Xu, Yinghao and Dong, Yue and Yu, Yingqing and Li, Chongyang and Liu, Yongyi and Ma, Chongxuan and Tao, Dacheng},
  journal={arXiv preprint arXiv:2306.14685},
  year={2023}
}

License

This model is licensed under the MIT License.