diffsketcher / README.md
jree423's picture
Update: Add model card for original implementation
65d1d19 verified
|
raw
history blame
1.99 kB
metadata
pipeline_tag: text-to-image
tags:
  - text-to-image
  - svg
  - vector-graphics
license: mit

Diffsketcher - Vector Graphics Generation

This model generates vector graphics (SVG) from text prompts. It uses the original implementation from the official repository.

Model Description

DiffSketcher generates vector graphics (SVG) from text prompts. It uses a diffusion model to guide the SVG generation and creates sketches with a specified number of paths.

Usage

import requests

API_URL = "https://api-inference.huggingface.co/models/jree423/diffsketcher"
headers = {"Authorization": "Bearer YOUR_API_TOKEN"}

def query(prompt):
    response = requests.post(API_URL, headers=headers, json={"inputs": prompt})
    return response.content  # Returns the image directly

# Generate an image
with open("output.png", "wb") as f:
    f.write(query("a beautiful mountain landscape"))

Examples

  • "a beautiful mountain landscape"
  • "a red sports car"
  • "a portrait of a woman"
  • "a cat playing with a ball"

How It Works

  1. Text Encoding: The text prompt is encoded using CLIP.
  2. Diffusion Process: A diffusion model generates a latent representation.
  3. SVG Generation: The latent representation is used to generate an SVG.
  4. PNG Conversion: The SVG is converted to PNG for display.

Performance Considerations

  • The original implementation requires significant computational resources
  • Generation can take several minutes depending on the complexity
  • GPU acceleration is recommended for optimal performance

Citation

@article{xing2023diffsketcher,
  title={{DiffSketcher}: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
  author={Xing, XiMing and Zhan, Chuang and Xu, Yinghao and Dong, Yue and Yu, Yingqing and Li, Chongyang and Liu, Yongyi and Ma, Chongxuan and Tao, Dacheng},
  journal={arXiv preprint arXiv:2306.14685},
  year={2023}
}

License

This model is licensed under the MIT License.