| | --- |
| | license: creativeml-openrail-m |
| | base_model: runwayml/stable-diffusion-v1-5 |
| | tags: |
| | - stable-diffusion |
| | - stable-diffusion-diffusers |
| | - image-to-image |
| | - diffusers |
| | - controlnet |
| | - jax-diffusers-event |
| | inference: true |
| | library_name: diffusers |
| | --- |
| | |
| | # controlnet- JFoz/dog-cat-pose |
| | |
| | Simple controlnet model made as part of the HF JaX/Diffusers community sprint. |
| |
|
| | These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with pose conditioning generated using the animalpose model of OpenPifPaf. |
| |
|
| | Some example images can be found in the following |
| |
|
| | prompt: a tortoiseshell cat is sitting on a cushion |
| |  |
| | prompt: a yellow dog standing on a lawn |
| |  |
| |
|
| | Whilst not the dataset used for this model, a smaller dataset with the same |
| | format for conditioning images can be found at https://huggingface.co/datasets/JFoz/dog-poses-controlnet-dataset |
| |
|
| | The dataset was generated using the code at https://github.com/jfozard/animalpose/tree/f1be80ed29886a1314054b87f2a8944ea98997ac |
| |
|
| |
|
| | # Model Card for dog-cat-pose |
| |
|
| | This is an ControlNet model which allows users to control the pose of a dog or cat. Poses were extracted from images using the animalpose model of OpenPifPaf https://openpifpaf.github.io/intro.html . Skeleton colouring is as shown in the dataset. See also https://huggingface.co/JFoz/dog-pose |
| |
|
| |
|
| |
|
| | # Model Details |
| |
|
| | ## Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is/does. --> |
| | This is an ControlNet model which allows users to control the pose of a dog or cat. Poses were extracted from images using the animalpose model of OpenPifPaf https://openpifpaf.github.io/intro.html. Skeleton colouring is as shown in the dataset. See also https://huggingface.co/JFoz/dog-pose |
| |
|
| | - **Developed by:** John Fozard |
| | - **Model type:** Conditional image generation |
| | - **Language(s) (NLP):** en |
| | - **License:** openrail |
| | - **Parent Model:** https://huggingface.co/runwayml/stable-diffusion-v1-5 |
| | - **Resources for more information:** |
| | - [GitHub Repo](https://github.com/jfozard/animalpose/tree/f1be80ed29886a1314054b87f2a8944ea98997ac) |
| |
|
| |
|
| | # Uses |
| |
|
| | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| |
|
| | ## Direct Use |
| |
|
| | <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
| | <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." --> |
| |
|
| | Supply a suitable, potentially incomplete pose along with a relevant text prompt |
| |
|
| |
|
| | ## Out-of-Scope Use |
| |
|
| | <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
| | <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." --> |
| |
|
| | Generating images of non-animals. We advise retaining the stable diffusion safety filter when using this model. |
| |
|
| |
|
| | # Bias, Risks, and Limitations |
| |
|
| | <!-- This section is meant to convey both technical and sociotechnical limitations. --> |
| |
|
| | The model is trained on a relatively small dataset, and may be overfit to those images. |
| |
|
| |
|
| | ## Recommendations |
| |
|
| | <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
| |
|
| |
|
| | Maintain careful supervision of model inputs and outputs. |
| |
|
| |
|
| | # Training Details |
| |
|
| | ## Training Data |
| |
|
| | <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
| |
|
| | Trained on a subset of Laion-5B using clip retrieval with the prompts "a photo of a (dog/cat) (standing/walking)" |
| |
|
| | ## Training Procedure |
| |
|
| | <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
| |
|
| | ### Preprocessing |
| |
|
| | Images were rescaled to 512 along their short edge and centrally cropped. The OpenPifPaf pose-detection model was used to extract poses, which were used to generate conditioning images. |
| |
|
| |
|
| |
|
| |
|
| | ## Compute Infrastructure |
| |
|
| | TPUv4i |
| |
|
| |
|
| |
|
| | ### Software |
| |
|
| | Flax stable diffusion controlnet pipeline |
| |
|
| |
|
| |
|
| | # Model Card Authors [optional] |
| |
|
| | <!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. --> |
| |
|
| | John Fozard |
| |
|
| |
|