tuandunghcmut commited on Apr 10, 2025

Commit

f435a72

verified ·

1 Parent(s): 9eb384f

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

PaddleMIX/applications/README.md +98 -0
PaddleMIX/applications/README_en.md +87 -0
PaddleMIX/applications/gradio_autolable.py +187 -0
PaddleMIX/deploy/README.md +110 -0
PaddleMIX/docs/CHANGELOG.md +44 -0
PaddleMIX/docs/FAQ.md +0 -0
PaddleMIX/paddlemix/__init__.py +20 -0
PaddleMIX/ppdiffusers/README.md +1278 -0
PaddleMIX/ppdiffusers/VERSION +1 -0
PaddleMIX/ppdiffusers/requirements.txt +18 -0
PaddleMIX/ppdiffusers/setup.py +71 -0
PaddleMIX/scripts/build_wheel.sh +136 -0
a_main_folder/lavis_examples/albef_feature_extraction.ipynb +0 -0
a_main_folder/lavis_examples/albef_vqa.ipynb +0 -0
a_main_folder/lavis_examples/albef_zero_shot_classification.ipynb +0 -0
a_main_folder/lavis_examples/blip2_feature_extraction.ipynb +145 -0
a_main_folder/lavis_examples/blip2_image_text_matching.ipynb +141 -0
a_main_folder/lavis_examples/blip2_instructed_generation.ipynb +0 -0
a_main_folder/lavis_examples/blip_feature_extraction.ipynb +0 -0
a_main_folder/lavis_examples/blip_image_captioning.ipynb +0 -0
a_main_folder/lavis_examples/blip_image_text_matching.ipynb +0 -0
a_main_folder/lavis_examples/blip_text_localization.ipynb +0 -0
a_main_folder/lavis_examples/blip_vqa.ipynb +0 -0
a_main_folder/lavis_examples/blip_zero_shot_classification.ipynb +0 -0
a_main_folder/lavis_examples/clip_feature_extraction.ipynb +0 -0
a_main_folder/lavis_examples/clip_zero_shot_classification.ipynb +0 -0
a_main_folder/litserve/.lightning_studio/.studiorc +4 -0
a_main_folder/litserve/.lightning_studio/on_start.sh +13 -0
a_main_folder/litserve/.lightning_studio/on_stop.sh +8 -0
a_main_folder/litserve/aurasr.ipynb +215 -0
a_main_folder/litserve/aurasr/.lightning_studio/.studiorc +4 -0
a_main_folder/litserve/aurasr/.lightning_studio/on_start.sh +13 -0
a_main_folder/litserve/aurasr/.lightning_studio/on_stop.sh +8 -0
a_main_folder/litserve/aurasr/client.py +30 -0
a_main_folder/litserve/aurasr/input.jpg +0 -0
a_main_folder/litserve/aurasr/server.py +33 -0
a_main_folder/llm2vec/test.ipynb +80 -0
a_main_folder/ultralytics/input.jpg +0 -0
a_main_folder/ultralytics/test.ipynb +0 -0
open_clip/src/open_clip/hf_configs.py +67 -0
open_clip/src/open_clip/model_configs/RN101.json +21 -0
open_clip/src/open_clip/model_configs/RN50x16.json +21 -0
open_clip/src/open_clip/model_configs/ViT-B-16-SigLIP-i18n-256.json +29 -0
open_clip/src/open_clip/model_configs/ViT-B-16-quickgelu.json +17 -0
open_clip/src/open_clip/model_configs/ViT-B-16.json +16 -0
open_clip/src/open_clip/model_configs/ViT-B-32-plus-256.json +16 -0
open_clip/src/open_clip/model_configs/ViT-B-32-quickgelu.json +17 -0
open_clip/src/open_clip/model_configs/ViT-B-32.json +16 -0
open_clip/src/open_clip/model_configs/ViT-H-14-378-quickgelu.json +18 -0
open_clip/src/open_clip/model_configs/ViT-H-14-CLIPA.json +26 -0

PaddleMIX/applications/README.md ADDED Viewed

	@@ -0,0 +1,98 @@

+**简体中文** | [English](./README_en.md)
+<p align="center">
+  <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/22989727/2cd19298-1c52-4d73-a0f7-dcdab6a8ec90" align="middle" width = "600" />
+</p>
+<p align="center">
+    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleMIX/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleMIX?color=ccf"></a>
+</p>
+<h4 align="center">
+  <a href=#特性> 特性 </a> |
+  <a href=#快速开始> 快速开始 </a>
+</h4>
+**PaddleMIX**应用示例基于paddlemix、ppdiffusers和paddlenlp开发，**简单易用**且**功能强大**。聚合业界**优质预训练模型**并提供**开箱即用**的开发体验，覆盖跨模态和多场景的模型库搭配，可满足开发者**灵活定制**的需求。
+<img src="https://github.com/user-attachments/assets/4c695140-bf4c-46db-bbb5-5dd8197be947" align="center" />
+## 快速开始
+请先确认是否已安装 [PaddleMIX](../README.md/#安装) 和 [ppdiffusers](../README.md/#安装)
+### 1. appflow 依赖安装
+```shell
+pip install -r paddlemix/appflow/requirements.txt
+```
+### 2.一键预测
+PaddleMIX提供一键预测功能，无需训练，这里以开放世界检测分割为例。直接在终端运行如下命令，即可完成模型推理。
+```python
+>>> python
+>>> from paddlemix.appflow import Appflow
+>>> from ppdiffusers.utils import load_image
+>>> task = Appflow(app="openset_det_sam",
+                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
+                   static_mode=False) #如果开启静态图推理，设置为True,默认动态图
+>>> url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+>>> image_pil = load_image(url)
+>>> result = task(image=image_pil,prompt="dog")
+```
+参数说明
+| 参数 | 是否必须| 含义                                                                                          |
+|-------|-------|---------------------------------------------------------------------------------------------|
+| --app | Yes| 应用名称                                                                                   |
+| --models | Yes | 需要使用的模型，可以是单个模型，也可以多个组合                                                                                     |
+| --static_mode  | Option | 是否静态图推理，默认False                                                                                 |
+| --precision | Option | 当 static_mode == True 时使用，默认fp32,可选择trt_fp32、trt_fp16                                                                                    |
+## 特性
+#### <a href=#开箱即用的工具集> 开箱即用的工具集 </a>
+#### <a href=#跨模态多场景应用> 跨模态多场景应用 </a>
+### 开箱即用的工具集
+Appflow提供丰富的开箱即用工具集，覆盖跨模态多场景应用，提供产业级的效果与极致的推理性能。
+![appflow](https://github.com/LokeZhou/PaddleMIX/assets/13300429/f80a7aa0-4cd5-4f86-90d6-2fc6da3eb42f)
+### 跨模态多场景应用
+| 应用名称                           | 调用模型                         | 静态图推理    |
+| :--------------------------------- | -------------------------------- | ----------|
+| [视觉语言对话（Vision-Language-Chat）](./VLChat/README.md)              | `qwen-vl-chat-7b`  |     🚧     |
+| [开放世界检测分割（Openset-Det-Sam）](./CVinW/README.md/#开放世界检测分割grounded-sam-detect-and-segment-everything-with-text-prompt)              | `grounded sam`  |     ✅      |
+| [自动标注（AutoLabel）](./Automatic_label/README.md/#自动标注autolabel)              | `blip2 grounded sam`        |      ✅       |
+| [检测框引导的图像编辑（Det-Guided-Inpainting）](./Inpainting/README.md/#检测框引导的图像编辑det-guided-inpainting)      | `chatglm-6b stable-diffusion-2-inpainting grounded sam`                 |     ✅     |
+| [文图生成（Text-to-Image Generation）](./text2image/README.md/#文图生成text-to-image-generation)      | `runwayml/stable-diffusion-v1-5 stabilityai/stable-diffusion-xl-base-1.0`   |    [fastdeploy](../ppdiffusers/deploy/README.md/#文图生成text-to-image-generation)     |
+| [文本引导的图像放大（Text-Guided Image Upscaling）](./image2image/README.md/#文本引导的图像放大text-guided-image-upscaling)           | `ldm-super-resolution-4x-openimages`|    ❌     |
+| [文本引导的图像编辑（Text-Guided Image Inpainting）](./Inpainting/README.md/#文本引导的图像编辑text-guided-image-inpainting) | `stable-diffusion-2-inpainting`     |   [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像编辑text-guided-image-inpainting)     |
+| [文本引导的图像变换（Image-to-Image Text-Guided Generation）](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation)              | `stable-diffusion-v1-5`    |    [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation)    |
+| [文本条件的视频生成（Text-to-Video Generation）](./text2video/README.md/#文本条件的视频生成text-to-video-generation)      | `text-to-video-ms-1.7b`  |     ❌     |
+| [音频生成图像（Audio-to-Image Generation）](./Audio2Img/README.md/#audio-to-image)  | `imagebind stable-diffusion-2-1-unclip`  |          |
+| [音频描述（Audio-to-Caption Generation）](./Audio2Caption/README.md/#音频描述audio-to-caption-generation)  | `chatglm-6b whisper`  |          |
+| [音频对话（Audio-to-Chat Generation）](./AudioChat/README.md/#音频对话audio-to-chat-generation)  | `chatglm-6b whisper fastspeech2`  |          |
+| [音乐生成（Music Generation）](./MusicGeneration/README.md/#音乐生成music-generation)  | `chatglm-6b minigpt4 audioldm`  |          |
+更多应用持续开发中......
+* ✅: Supported
+* 🚧: In Progress
+* ❌: Not Supported

PaddleMIX/applications/README_en.md ADDED Viewed

	@@ -0,0 +1,87 @@

+**English** | [简体中文](./README.md)
+<p align="center">
+  <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/22989727/2cd19298-1c52-4d73-a0f7-dcdab6a8ec90" align="middle" width = "600" />
+</p>
+<p align="center">
+    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleMIX/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleMIX?color=ccf"></a>
+</p>
+<h4 align="center">
+  <a href=#Features> Features </a> |
+  <a href=#quick-start> Quick Start </a> |
+</h4>
+**PaddleMIX** application example is developed based on paddlemix, ppdiffusers, and Paddlenlp，which is **simple** and **easy** to use  and **powerful**. Aggregating industry high-quality pre trained models and providing out of the box development experience, covering cross modal and multi scenario model library matching, can meet the needs of developers flexible customization .
+<img src="https://github.com/user-attachments/assets/4c695140-bf4c-46db-bbb5-5dd8197be947" align="center" />
+## Quick Start
+Please confirm if it has been installed first [PaddleMIX](../README_EN.md/#installation) and [ppdiffusers](../README_EN.md/#installation)
+### 1.requirements
+```shell
+pip install -r paddlemix/appflow/requirements.txt
+```
+### 2.Appflow
+PaddleMIX provides Appflow without training, and can directly input data to output results:
+```
+>>> python
+>>> from paddlemix.appflow import Appflow
+>>> from ppdiffusers.utils import load_image
+>>> task = Appflow(app="openset_det_sam",
+                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
+                   static_mode=False) #如果开启静态图推理，设置为True,默认动态图
+>>> url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+>>> image_pil = load_image(url)
+>>> result = task(image=image_pil,prompt="dog")
+```
+Parameter Description
+| parameter | required| meaning                                                                                          |
+|-------|-------|---------------------------------------------------------------------------------------------|
+| --app | Yes| app name                                                                                   |
+| --models | Yes | model list,can be a single model or multiple combinations                               |
+| --static_mode  | Option | static graph inference, default : False                                          |
+| --precision | Option | when static_mode == True used，default: fp32, option trt_fp32、trt_fp16                                                                                    |
+## Features
+#### <a href=#out-of-box-toolset> Out-of-Box Toolset </a>
+#### <a href=#multi-modal-and-scenario> Multi Modal And Scenario </a>
+### Out-of-Box Toolset
+Appflow provides a rich set of out of the box tools that cover cross modal and multi scenario applications, providing industry level effects and ultimate reasoning performance.
+![appflow](https://github.com/LokeZhou/PaddleMIX/assets/13300429/f80a7aa0-4cd5-4f86-90d6-2fc6da3eb42f)
+### Multi Modal And Scenario
+| name                           | models                         | static mode    |
+| :--------------------------------- | -------------------------------- | ----------|
+| [视觉语言对话（Vision-Language-Chat）](./VLChat/README.md)              | `qwen-vl-chat-7b`  |     🚧     |
+| [开放世界检测分割（Openset-Det-Sam）](./CVinW/README.md/#开放世界检测分割grounded-sam-detect-and-segment-everything-with-text-prompt)              | `grounded sam`  |     ✅      |
+| [自动标注（AutoLabel）](./Automatic_label/README.md/#自动标注autolabel)              | `blip2 grounded sam`        |      ✅       |
+| [检测框引导的图像编辑（Det-Guided-Inpainting）](./Inpainting/README.md/#检测框引导的图像编辑det-guided-inpainting)      | `chatglm-6b stable-diffusion-2-inpainting grounded sam`                 |     ✅     |
+| [文图生成（Text-to-Image Generation）](./text2image/README.md/#文图生成text-to-image-generation)      | `runwayml/stable-diffusion-v1-5`   |    [fastdeploy](../ppdiffusers/deploy/README.md/#文图生成text-to-image-generation)     |
+| [文本引导的图像放大（Text-Guided Image Upscaling）](./image2image/README.md/#文本引导的图像放大text-guided-image-upscaling)           | `ldm-super-resolution-4x-openimages`|    ❌     |
+| [文本引导的图像编辑（Text-Guided Image Inpainting）](./Inpainting/README.md/#文本引导的图像编辑text-guided-image-inpainting) | `stable-diffusion-2-inpainting`     |   [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像编辑text-guided-image-inpainting)     |
+| [文本引导的图像变换（Image-to-Image Text-Guided Generation）](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation)              | `stable-diffusion-v1-5`    |    [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation)    |
+| [文本条件的视频生成（Text-to-Video Generation）](./text2video/README.md/#文本条件的视频生成text-to-video-generation)      | `text-to-video-ms-1.7b`  |     ❌     |
+More applications under continuous development......
+* ✅: Supported
+* 🚧: In Progress
+* ❌: Not Supported

PaddleMIX/applications/gradio_autolable.py ADDED Viewed

	@@ -0,0 +1,187 @@

+from paddlemix.appflow import Appflow
+from ppdiffusers.utils import load_image
+import paddle
+import cv2
+import os
+import json
+from zipfile import ZipFile
+import zipfile
+import numpy as np
+from PIL import Image, ImageDraw
+import gradio as gr
+import traceback
+import math
+import tempfile
+task = Appflow(app="auto_label",
+               models=["paddlemix/blip2-caption-opt2.7b","GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"])
+def auto_label(img, prompt):
+    result = task(image=img,blip2_prompt = prompt)
+    return result
+def result2json(result, filename):
+    label_data = {'version': '0.0.0',
+                'flags': {} ,
+                'shapes': [],
+                'imagePath': filename,
+                'imageHeight': result['image'].size[1],
+                'imageWidth': result['image'].size[0]}
+    for i in range(len(result['labels'])):
+        # label去掉末尾的置信度
+        label = result['labels'][i]
+        spl_idx = -1
+        for j in range(len(label)):
+            if label[j] == '(':
+                spl_idx = j
+        if spl_idx == -1:
+            label = label
+        else:
+            label = label[:spl_idx]
+        # 增加bbox
+        rect = result['boxes'][i].tolist()
+        xmin, ymin, xmax, ymax = rect
+        label_data['shapes'].append(
+            {'label': label,
+            'points': [[xmin, ymin],[xmax, ymax]],
+            'group_id': None,
+            'shape_type': 'rectangle',
+            'flags': {}
+            }
+        )
+        # 记录polygen
+        seg_mask = result['seg_masks'][i].numpy()[0]
+        mask_img = seg_mask.astype('uint8')*255
+        contours, _ = cv2.findContours(mask_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+        points = []
+        for contour in contours:
+            for point in contour:
+                points.append(point[0].tolist())
+        # 增加polygen
+        rect = result['boxes'][i]
+        xmin, ymin, xmax, ymax = rect
+        label_data['shapes'].append(
+            {'label': label,
+            'points': points,
+            'group_id': None,
+            'shape_type': 'polygon',
+            'flags': {}
+            }
+        )
+    return label_data
+def generate_mask(img, result_masks):
+    divide_part = int(255/(math.ceil(len(result_masks)/3)+1))
+    np_img = np.array(img)
+    for i in range(len(result_masks)):
+        color = [0,0,0]
+        c = i%3
+        p = i//3+1
+        color[c] = divide_part*p
+        mask = result_masks[i]
+        M = mask.numpy()[0]
+        np_img[M] = color
+        print(color)
+    img = Image.fromarray(np_img)
+    return img
+def al_fun(img, prompt):
+    img = Image.fromarray(img.astype('uint8')).convert('RGB')
+    result = auto_label(img, prompt)
+    label_data = result2json(result, "tmpimg")
+    # Draw BBox
+    draw = ImageDraw.Draw(img)
+    for i in range(len(result['boxes'])):
+        rect = result['boxes'][i].tolist()
+        draw.rectangle(rect, width=10)
+    # Draw Mask
+    mask_img = generate_mask(result['image'], result['seg_masks'])
+    # Write File
+    labeled_file = os.path.join(tmpdir,'labeled_date.json')
+    with open(labeled_file,'w') as f:
+        json.dump(label_data, f, indent=4)
+    return img, mask_img, labeled_file
+def al_file_fun(file_in, prompt):
+    out_zip_file = os.path.join(tmpdir, "labeled.zip")
+    with ZipFile(out_zip_file, "w") as zipObj:
+        for _, imgname in enumerate(file_in):
+            image_pil = Image.open(imgname.name)
+            result = auto_label(image_pil, prompt)
+            label_data = result2json(result, imgname.name.split("/")[-1])
+            labeled_file = os.path.join(tmpdir,imgname.name.split("/")[-1]+'.josn')
+            with open(labeled_file,'w') as f:
+                json.dump(label_data, f, indent=4)
+            zipObj.write(labeled_file)
+    return out_zip_file
+def al_zip_fun(zip_in, prompt):
+    for _, zipname in enumerate(zip_in):
+        with open('test.txt', 'a') as f:
+            f.write(zipname.name+'\n')
+            f.write(zipname.name+'\n')
+        zipfile.ZipFile(zipname.name).extractall(tmpdir)
+        with open('test.txt', 'a') as f:
+            f.write('\n after extract \n')
+    out_zip_file = os.path.join(tmpdir, "labeled.zip")
+    with ZipFile(out_zip_file, "w") as zipObj:
+        for root, _, files in os.walk(tmpdir, topdown=False):
+            for name in files:
+                if name.split('.')[-1] in ['jpg', 'png', 'jpeg', 'JPG', 'PNG', 'JPEG']:
+                    img_path = os.path.join(root, name)
+                    json_path = os.path.join(root, name+'.json')
+                    image_pil = Image.open(img_path)
+                    result = auto_label(image_pil, prompt)
+                    label_data = result2json(result, img_path)
+                    with open(json_path,'w') as f:
+                        json.dump(label_data, f, indent=4)
+                    zipObj.write(json_path)
+                    os.remove(img_path)
+    return out_zip_file
+with gr.Blocks() as demo:
+    gr.Markdown("# 自动标注（AutoLabel）")
+    with gr.Tab("单张图片标注"):
+        with gr.Row():
+            al_image_in = gr.Image(label = "输入图片")
+            al_image_out1 = gr.Image(label = "BBox标注图片")
+            al_image_out2 = gr.Image(label = "Mask标注图片")
+        al_text_in = gr.Text(label = "Prompt", value="describe the image")
+        al_file_out_ = gr.File(label = "标注文件")
+        al_button = gr.Button()
+        al_button.click(fn=al_fun, inputs = [al_image_in, al_text_in], outputs = [al_image_out1, al_image_out2, al_file_out_])
+    with gr.Tab("上传多张图片批量标注"):
+        with gr.Row():
+            al_file_in = gr.Files(label = "上传多张图片", file_types=['.jpg', '.png', '.jpeg', '.JPG', '.PNG', '.JPEG'])
+            al_file_out = gr.File(label = "标注结果")
+        al_file_text_in = gr.Text(label = "Prompt", value="describe the image")
+        al_file_button = gr.Button()
+        al_file_button.click(fn=al_file_fun, inputs = [al_file_in, al_file_text_in], outputs = [al_file_out])
+    with gr.Tab("上传压缩包批量标注"):
+        with gr.Row():
+            al_zip_in = gr.Files(label = "上传压缩包", file_types=['.zip'])
+            al_zip_out = gr.File(label = "标注结果")
+        al_zip_text_in = gr.Text(label = "Prompt", value="describe the image")
+        al_zip_button = gr.Button()
+        al_zip_button.click(fn=al_zip_fun, inputs = [al_zip_in, al_zip_text_in], outputs = [al_zip_out])
+# for download file, use the tempfile
+global tmpdir
+with tempfile.TemporaryDirectory(dir='.') as tmpdir:
+    demo.launch()

PaddleMIX/deploy/README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# PaddleMIX推理部署
+[[English](README_en.md)]
+PaddleMIX基于Paddle Inference，提供了python的部署方案。部署方式分为两种：
+- 通过 **APPflow** ,设置static_mode = True 变量开启静态图推理，同时可配合trt加速推理；该方式部分模型不支持静态图以及trt，具体模型可参考[跨模态多场景应用](../applications/README.md/#跨模态多场景应用)；
+- 单模型部署
+## 1.APPflow部署
+在使用 PaddleMIX 一键预测 **APPflow** 时，可通过设置 static_mode = True 变量开启静态图推理，同时可配合trt加速推理。
+### 1.1 示例
+```python
+>>> from paddlemix.appflow import Appflow
+>>> from PIL import Image
+>>> task = Appflow(app="openset_det_sam",
+                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
+                   static_mode=True,
+                   precision="fp32")
+>>> image_pil = Image.open("beauty.png").convert("RGB")
+>>> result = task(image=image_pil,prompt="women")
+```
+### 1.2 参数说明
+| 参数 | 是否必须| 含义                                                                                          |
+|-------|-------|---------------------------------------------------------------------------------------------|
+| --app | Yes| 应用名称                                                                                   |
+| --models | Yes | 需要使用的模型，可以是单个模型，也可以多个组合                                                                                     |
+| --static_mode  | Option | 是否静态图推理，默认False                                                                                 |
+| --precision | Option | 当 static_mode == True 时使用，默认fp32,可选择trt_fp32、trt_fp16                                                                                    |
+说明：
+- 部分模型不支持静态图以及trt，具体可参考[跨模态多场景应用](../applications/README.md)
+- 生成的静态图将在模型名字对应的文件夹下 如:GroundingDino/groundingdino-swint-ogc/
+## 2. 单模型预测部署
+Python端预测部署主要包含两个步骤：
+- 导出预测模型
+- 基于Python进行预测
+当前支持模型：
+- [blip2](./blip2/README.md)
+- [groundingdino](./groundingdino/README.md)
+- [sam](./sam/README.md)
+- [qwen_vl](./qwen_vl/README.md)
+以 groundingdino 为例子。
+### 2.1 导出预测模型
+```bash
+cd deploy/groundingdino
+# 导出groundingdino模型
+python export.py \
+--dino_type GroundingDino/groundingdino-swint-ogc
+```
+导出后目录下，包括 `model_state.pdiparams`,  `model_state.pdiparams.info`, `model_state.pdmodel`等文件。
+### 2.2 基于python的预测
+```bash
+ python predict.py  \
+ --text_encoder_type GroundingDino/groundingdino-swint-ogc \
+ --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
+ --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
+ --output_dir ./groundingdino_predict_output \
+ --prompt "bus"
+```
+## 3. 推理 BenchMark
+> Note:
+> 测试环境为:
+Paddle 3.0，
+PaddleMIX release/2.0
+PaddleNLP2.7.2
+A100 80G单卡。
+### 3.1 benchmark命令
+在 `deploy` 对应模型目录下的运行后加 --benchmark,
+如 GroundingDino 的benchmark命令为：
+```bash
+ cd deploy/groundingdino
+ python predict.py  \
+ --text_encoder_type GroundingDino/groundingdino-swint-ogc \
+ --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
+ --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
+ --output_dir ./groundingdino_predict_output \
+ --prompt "bus" \
+ --benchmark True
+```
+# A100性能数据
+|模型|图片分辨率|数据类型 |Paddle Deploy |
+|-|-|-|-|
+|qwen-vl-7b|448*448|fp16|669.8 ms|
+|llava-1.5-7b|336*336|fp16|981.2 ms|
+|llava-1.6-7b|336*336|fp16|778.7 ms|
+|groundingDino/groundingdino-swint-ogc|800*1193|fp32|100 ms|
+|Sam/SamVitH-1024|1024*1024|fp32|121 ms|

PaddleMIX/docs/CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# 版本更新信息
+## 最新版本信息
+### 2.0(07/26/2024)
+#### 多模态理解
+1. 新增模型：LLaVA: v1.5-7b, v1.5-13b, v1,6-7b，CogAgent, CogVLM, Qwen-VL, InternLM-XComposer2
+2. 数据集增强：新增chatml_dataset图文对话数据读取方案，可自定义chat_template文件适配，支持混合数据集
+3. 工具链升级：新增Auto模块，统一SFT训练流程，兼容全参数、lora训练。新增mixtoken训练策略，SFT吞吐量提升5.6倍。支持Qwen-VL，LLaVA推理部署，较torch推理性能提升2.38倍
+#### 多模态生成
+1. 视频生成能力：支持Sora相关技术，支持DiT、SiT、UViT训练推理，新增NaViT、MAGVIT-v2模型； 新增视频生成模型SVD、Open Sora，支持模型微调和推理； 新增姿态可控视频生成模型AnimateAnyone、即插即用视频生成模型AnimateDiff、GIF视频生成模型Hotshot-XL；
+2. 文生图模型库：新增高速推理文图生成模型LCM，适配SD/SDXL训练和推理；
+3. 工具链升级：发布ppdiffusers 0.24.1版本，新增peft，accelerate后端； 权重加载/保存全面升级，支持分布式、模型切片、safetensors等场景。
+4. 生态兼容：提供基于ppdiffusers开发的ComfyUI插件，支持了常见的模型加载转换、文生图、图生图、图像局部修改等任务。新增Stable Diffusion 1.5系列节点；新增Stable Diffusion XL系列节点。新增4个图像生成的workflow案例。
+#### DataCopilot（多模态数据处理工具箱）
+1. 多模态数据集类型MMDataset，支持加载和导出Json、H5、Jsonl等多种数据存储格式，内置并发（map, filter）数据处理接口等
+2. 多模态数据格式工具，支持自定义数据结构，数据转换，离线格式检查
+3. 多模态数据分析工具，支持基本的统计信息，数据可视化功能，以及注册自定义功能
+### 1.0(11/15/2023)
+#### 核心能力
+1. 大规模预训练: BLIP-2支持数据并行、sharding、模型并行，流水线并行训练；支持千亿参数规模训练; EVA-CLIP支持数据并行、sharding、模型并行训练; Stable Diffusion支持数据并行、sharding、BF16 O2训练; CLIP，Coca支持数据并行训练
+2. 有监督精调: Stable Diffusion，SDXL 支持LoRA精调
+3. 推理部署: 支持BLIP-2，miniGPT-4，Grounding DINO, SAM，Stable Diffusion动转静导出部署
+#### 前沿模型
+1. 新增CLIP系列跨模态大模型：CLIP，EVA-CLIP，Coca
+2. 新增图生文跨模态大模型：BLIP-2，miniGPT-4，VisualGLM
+3. 新增跨模态视觉模型：Grounding DINO， SAM
+4. 新增融合更多模态大模型：ImageBind
+5. 新增文生图模型：SDXL，支持Text2Image、Img2Img、Inpainting、InstructPix2Pix等任务，支持DreamBooth Lora训练； 新增UniDiffuser，通过统一的多模态扩散过程支持文生图、图生文等任务； 新增文本条件视频生成模型LVDM，支持训练与推理； 新增文图生成模型Kandinsky 2.2，Consistency models； Controlnet升级，支持ControlNetImg2Img、ControlNetInpaint、 StableDiffusionXLControlNet等。
+#### 特色应用
+1. 新增跨模态大模型应用流水线AppFlow
+2. 新增基于chat的图像编辑应用
+3. 新增自动标注应用

PaddleMIX/docs/FAQ.md ADDED Viewed

File without changes

PaddleMIX/paddlemix/__init__.py ADDED Viewed

	@@ -0,0 +1,20 @@

+# copyright (c) 2023 paddlepaddle authors. all rights reserved.
+# copyright 2023 the salesforce team authors and the huggingface team. all rights reserved.
+#
+# licensed under the apache license, version 2.0 (the "license");
+# you may not use this file except in compliance with the license.
+# you may obtain a copy of the license at
+#
+#     http://www.apache.org/licenses/license-2.0
+#
+# unless required by applicable law or agreed to in writing, software
+# distributed under the license is distributed on an "as is" basis,
+# without warranties or conditions of any kind, either express or implied.
+# see the license for the specific language governing permissions and
+# limitations under the license.
+from .datasets import *
+from .models import *
+from .optimization import *
+from .processors import *
+from .triton_ops import *

PaddleMIX/ppdiffusers/README.md ADDED Viewed

	@@ -0,0 +1,1278 @@

+<div align="center">
+  <img src="https://user-images.githubusercontent.com/11793384/215372703-4385f66a-abe4-44c7-9626-96b7b65270c8.png" width="40%" height="40%" />
+</div>
+<p align="center">
+    <a href="https://pypi.org/project/ppdiffusers/"><img src="https://img.shields.io/pypi/pyversions/ppdiffusers"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
+</p>
+<h4 align="center">
+  <a href=#特性> 特性 </a> |
+  <a href=#安装> 安装 </a> |
+  <a href=#快速开始> 快速开始 </a> |
+  <a href=#模型部署> 模型部署</a>
+</h4>
+# PPDiffusers: Diffusers toolbox implemented based on PaddlePaddle
+**PPDiffusers**是一款支持多种模态（如文本图像跨模态、图像、语音）扩散模型（Diffusion Model）训练和推理的国产化工具箱，依托于[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)框架和[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)自然语言处理开发库。
+## News 📢
+* 🔥 **2024.10.18 发布 0.29.0 版本，新增图像生成模型[Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/text_to_image/README_sd3.md)，支持DreamBooth训练及高性能推理；SD3、SDXL适配昇腾910B，提供国产计算芯片上的训推能力；DIT支持[高性能推理](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/class_conditional_image_generation/DiT/README.md#23-paddle-inference-%E9%AB%98%E6%80%A7%E8%83%BD%E6%8E%A8%E7%90%86)；支持PaddleNLP 3.0 beta版本。**
+* 🔥 **2024.07.15 发布 0.24.1 版本，新增[Open-Sora](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/Open-Sora)，支持模型训练和推理；全面支持Paddle 3.0。**
+* 🔥 **2024.04.17 发布 0.24.0 版本，支持[Sora相关技术](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora)，支持[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)、[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)、[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)训练推理，新增[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)、[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)模型；
+视频生成能力全面升级；
+新增视频生成模型[SVD](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_video_diffusion)，支持模型微调和推理；
+新增姿态可控视频生成模型[AnimateAnyone](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/AnimateAnyone)、即插即用视频生成模型[AnimateDiff](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/inference/text_to_video_generation_animediff.py)、GIF视频生成模型[Hotshot-XL](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community/Hotshot-XL)；
+新增高速推理文图生成模型[LCM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/consistency_distillation)，支持SD/SDXL训练和推理；
+[模型推理部署](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)全面升级；新增peft，accelerate后端；
+权重加载/保存全面升级，支持分布式、模型切片、safetensors等场景，相关能力已集成DiT、 [IP-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ip_adapter)、[PhotoMaker](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/PhotoMaker)、[InstantID](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/InstantID)等。**
+* 🔥 **2023.12.12 发布 0.19.4 版本，修复已知的部分 BUG，修复 0D Tensor 的 Warning，新增 SDXL 的 FastdeployPipeline。**
+* 🔥 **2023.09.27 发布 0.19.3 版本，新增[SDXL](#文本图像多模)，支持Text2Image、Img2Img、Inpainting、InstructPix2Pix等任务，支持DreamBooth Lora训练；
+新增[UniDiffuser](#文本图像多模)，通过统一的多模态扩散过程支持文生图、图生文等任务；
+新增文本条件视频生成模型[LVDM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_video_lvdm)，支持训练与推理；
+新增文图生成模型[Kandinsky 2.2](#文本图像多模)，[Consistency models](#文本图像多模)；
+Stable Diffusion支持[BF16 O2训练](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_diffusion)，效果对齐FP32；
+[LoRA加载升级](#加载HF-LoRA权重)，支持加载SDXL的LoRA权重；
+[Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)升级，支持ControlNetImg2Img、ControlNetInpaint、StableDiffusionXLControlNet等。**
+## 特性
+#### 📦 SOTA扩散模型Pipelines集合
+我们提供**SOTA（State-of-the-Art）** 的扩散模型Pipelines集合。
+目前**PPDiffusers**已经集成了**100+Pipelines**，支持文图生成（Text-to-Image Generation）、文本引导的图像编辑（Text-Guided Image Inpainting）、文本引导的图像变换（Image-to-Image Text-Guided Generation）、文本条件的视频生成（Text-to-Video Generation）、超分（Super Superresolution）、文本条件的音频生成（Text-to-Audio Generation）在内的**10余项**任务，覆盖**文本、图像、视频、音频**等多种模态。
+如果想要了解当前支持的所有**Pipelines**以及对应的来源信息，可以阅读[🔥 PPDiffusers Pipelines](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/README.md)文档。
+#### 🔊 提供丰富的Noise Scheduler
+我们提供了丰富的**噪声调度器（Noise Scheduler）**，可以对**速度**与**质量**进行权衡，用户可在推理时根据需求快速切换使用。
+当前**PPDiffusers**已经集成了**14+Scheduler**，不仅支持 [DDPM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddpm.py)、[DDIM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddim.py) 和 [PNDM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_pndm.py)，还支持最新的 [🔥 DPMSolver](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_dpmsolver_multistep.py)！
+#### 🎛️ 提供多种扩散模型组件
+我们提供了**多种扩散模型**组件，如[UNet1DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_1d.py)、[UNet2DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d.py)、[UNet2DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d_condition.py)、[UNet3DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_3d_condition.py)、[VQModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)、[AutoencoderKL](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)等。
+#### 📖 提供丰富的训练和推理教程
+我们提供了丰富的训练教程，不仅支持扩散模型的二次开发微调，如基于[Textual Inversion](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/textual_inversion)和[DreamBooth](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/dreambooth)使用3-5张图定制化训练生成图像的风格或物体，还支持[🔥 Latent Diffusion Model](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_laion400m)、[🔥 ControlNet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/controlnet)、[🔥 T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter)  等扩散模型的训练！
+此外，我们还提供了丰富的[🔥 Pipelines推理样例](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/inference)。
+#### 🚀 支持FastDeploy高性能部署
+我们提供基于[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)的[🔥 高性能Stable Diffusion Pipeline](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/stable_diffusion/pipeline_fastdeploy_stable_diffusion.py)，更多有关FastDeploy进行多推理引擎后端高性能部署的信息请参考[🔥 高性能FastDeploy推理教程](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)。
+## 安装
+### 环境依赖
+```
+pip install -r requirements.txt
+```
+关于PaddlePaddle安装的详细教程请查看[Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)。
+### pip安装
+```shell
+pip install --upgrade ppdiffusers
+```
+### 手动安装
+```shell
+git clone https://github.com/PaddlePaddle/PaddleMIX
+cd PaddleMIX/ppdiffusers
+python setup.py install
+```
+### 设置代理
+```shell
+export HF_HUB_ENABLE_HF_TRANSFER=1
+export HF_ENDPOINT=https://hf-mirror.com
+```
+## 快速开始
+我们将以扩散模型的典型代表**Stable Diffusion**为例，带你快速了解PPDiffusers。
+**Stable Diffusion**基于**潜在扩散模型（Latent Diffusion Models）**，专门用于**文图生成（Text-to-Image Generation）任务**。该模型是由来自 [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/)以及[RunwayML](https://runwayml.com/)的工程师共同开发完成，目前发布了v1和v2两个版本。v1版本采用了LAION-5B数据集子集（分辨率为 512x512）进行训练，并具有以下架构设置：自动编码器下采样因子为8，UNet大小为860M，文本编码器为CLIP ViT-L/14。v2版本相较于v1版本在生成图像的质量和分辨率等进行了改善。
+### Stable Diffusion重点模型权重
+<details><summary>&emsp; Stable Diffusion 模型支持的权重（英文） </summary>
+**我们只需要将下面的"xxxx"，替换成所需的权重名，即可快速使用！**
+```python
+from ppdiffusers import *
+pipe_text2img = StableDiffusionPipeline.from_pretrained("xxxx")
+pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained("xxxx")
+pipe_inpaint_legacy = StableDiffusionInpaintPipelineLegacy.from_pretrained("xxxx")
+pipe_mega = StableDiffusionMegaPipeline.from_pretrained("xxxx")
+# pipe_mega.text2img() 等于 pipe_text2img()
+# pipe_mega.img2img() 等于 pipe_img2img()
+# pipe_mega.inpaint_legacy() 等于 pipe_inpaint_legacy()
+```
+| PPDiffusers支持的模型名称                     | 支持加载的Pipeline                                    | 备注 | huggingface.co地址 |
+| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |
+| CompVis/stable-diffusion-v1-4           | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-4 使用 Stable-Diffusion-v1-2 的权重进行初始化。随后在"laion-aesthetics v2 5+"数据集上以 **512x512** 分辨率微调了 **225k** 步数，对文本使用了 **10%** 的dropout（即：训练过程中文图对中的文本有 10% 的概率会变成空文本）。模型使用了[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)作为文本编码器。| [地址](https://huggingface.co/CompVis/stable-diffusion-v1-4) |
+| CompVis/ldm-text2im-large-256               | LDMTextToImagePipeline | [LDM论文](https://arxiv.org/pdf/2112.10752.pdf) LDM-KL-8-G* 权重。| [地址](https://huggingface.co/CompVis/ldm-text2im-large-256) |
+| CompVis/ldm-super-resolution-4x-openimages  | LDMSuperResolutionPipeline | [LDM论文](https://arxiv.org/pdf/2112.10752.pdf) LDM-VQ-4 权重，[原始权重链接](https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip)。| [地址](https://huggingface.co/CompVis/ldm-super-resolution-4x-openimages) |
+| runwayml/stable-diffusion-v1-5              | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-5 使用 Stable-Diffusion-v1-2 的权重进行初始化。随后在"laion-aesthetics v2 5+"数据集上以 **512x512** 分辨率微调了 **595k** 步数，对文本使用了 **10%** 的dropout（即：训练过程中文图对中的文本有 10% 的概率会变成空文本）。模型同样也使用了[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)作为文本编码器。| [地址](https://huggingface.co/runwayml/stable-diffusion-v1-5) |
+| runwayml/stable-diffusion-inpainting        | StableDiffusionInpaintPipeline | Stable-Diffusion-Inpainting 使用 Stable-Diffusion-v1-2 的权重进行初始化。首先进行了 **595k** 步的常规训练（实际也就是 Stable-Diffusion-v1-5 的权重），然后进行了 **440k** 步的 inpainting 修复训练。对于 inpainting 修复训练，给 UNet 额外增加了 **5** 输入通道（其中 **4** 个用于被 Mask 遮盖住的图片，**1** 个用于 Mask 本身）。在训练期间，会随机生成 Mask，并有 **25%** 概率会将原始图片全部 Mask 掉。| [地址](https://huggingface.co/runwayml/stable-diffusion-inpainting) |
+| stabilityai/stable-diffusion-2-base         | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 该模型首先在 [LAION-5B 256x256 子集上](https://laion.ai/blog/laion-5b/) （过滤条件：[punsafe = 0.1 的 LAION-NSFW 分类器](https://github.com/LAION-AI/CLIP-based-NSFW-Detector) 和 审美分数大于等于 4.5 ）从头开始训练 **550k** 步，然后又在分辨率 **>= 512x512** 的同一数据集上进一步训练 **850k** 步。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2-base) |
+| stabilityai/stable-diffusion-2              | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | stable-diffusion-2 使用 stable-diffusion-2-base 权重进行初始化，首先在同一数据集上（**512x512** 分辨率）使用 [v-objective](https://arxiv.org/abs/2202.00512) 训��了 **150k** 步。然后又在 **768x768** 分辨率上使用 [v-objective](https://arxiv.org/abs/2202.00512) 继续训练了 **140k** 步。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2) |
+| stabilityai/stable-diffusion-2-inpainting   | StableDiffusionInpaintPipeline |stable-diffusion-2-inpainting 使用 stable-diffusion-2-base 权重初始化，并且额外训练了 **200k** 步。训练过程使用了 [LAMA](https://github.com/saic-mdal/lama) 中提出的 Mask 生成策略，并且使用 Mask 图片的 Latent 表示（经过 VAE 编码）作为附加条件。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) |
+| stabilityai/stable-diffusion-x4-upscaler    | StableDiffusionUpscalePipeline | 该模型在**LAION 10M** 子集上（>2048x2048）训练了 1.25M 步。该模型还在分辨率为 **512x512** 的图像上使用 [Text-guided Latent Upscaling Diffusion Model](https://arxiv.org/abs/2112.10752) 进行了训练。除了**文本输入**之外，它还接收 **noise_level** 作为输入参数，因此我们可以使用 [预定义的 Scheduler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/low_res_scheduler/scheduler_config.json) 向低分辨率的输入图片添加噪声。| [地址](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) |
+| hakurei/waifu-diffusion    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | waifu-diffusion-v1-2 使用 stable-diffusion-v1-4 权重初始化，并且在**高质量动漫**图像数据集上进行微调后得到的模型。用于微调的数据是 **680k** 文本图像样本，这些样本是通过 **booru 网站** 下载的。| [地址](https://huggingface.co/hakurei/waifu-diffusion) |
+| hakurei/waifu-diffusion-v1-3    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | waifu-diffusion-v1-3 是 waifu-diffusion-v1-2 基础上进一步训练得到的。他们对数据集进行了额外操作：（1）删除下划线；（2）删除括号；（3）用逗号分隔每个booru 标签；（4）随机化标签顺序。| [地址](https://huggingface.co/hakurei/waifu-diffusion) |
+| naclbit/trinart_stable_diffusion_v2_60k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | trinart_stable_diffusion 使用 stable-diffusion-v1-4 权重初始化，在 40k **高分辨率漫画/动漫风格**的图片数据集上微调了 8 个 epoch。V2 版模型使用 **dropouts**、**10k+ 图像**和**新的标记策略**训练了**更长时间**。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
+| naclbit/trinart_stable_diffusion_v2_95k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | **95k** 步数的结果，其他同上。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
+| naclbit/trinart_stable_diffusion_v2_115k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | **115k** 步数的结果，其他同上。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
+| Deltaadams/Hentai-Diffusion    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | None| [地址](https://huggingface.co/Deltaadams/Hentai-Diffusion) |
+| ringhyacinth/nail-set-diffuser    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 美甲领域的扩散模型，训练数据使用了 [Weekend](https://weibo.com/u/5982308498)| [地址](https://huggingface.co/ringhyacinth/nail-set-diffuser) |
+| Linaqruf/anything-v3.0    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 该模型可通过输入几个文本提示词就能生成**高质量、高度详细的动漫风格图片**，该模型支持使用 **danbooru 标签文本** 生成图像。| [地址](https://huggingface.co/Linaqruf/anything-v3.0) |
+</details>
+<details><summary>&emsp; Stable Diffusion 模型支持的权重（中文和多语言） </summary>
+| PPDiffusers支持的模型名称                     | 支持加载的Pipeline                                    | 备注 | huggingface.co地址 |
+| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |
+| BAAI/AltDiffusion                           | AltDiffusionPipeline、AltDiffusionImg2ImgPipeline | 该模型使用 [AltCLIP](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 作为文本编码器，在 Stable Diffusion 基础上训练了**双语Diffusion模型**，其中训练数据来自 [WuDao数据集](https://data.baai.ac.cn/details/WuDaoCorporaText) 和 [LAION](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus) 。| [地址](https://huggingface.co/BAAI/AltDiffusion) |
+| BAAI/AltDiffusion-m9                        | AltDiffusionPipeline、AltDiffusionImg2ImgPipeline |该模型使用9种语言的 [AltCLIP-m9](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 作为文本编码器，其他同上。| [地址](https://huggingface.co/BAAI/AltDiffusion-m9) |
+| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1 | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 他们将 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) 数据集 (100M) 和 [Zero](https://zero.so.com/) 数据集 (23M) 用作预训练的数据集，先用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 对这两个数据集的图文对相似性进行打分，取 CLIP Score 大于 0.2 的图文对作为训练集。 他们使用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 作为初始化的text encoder，冻住 [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([论文](https://arxiv.org/abs/2112.10752)) 模型的其他部分，只训练 text encoder，以便保留原始模型的生成能力且实现中文概念的对齐。该模型目前在0.2亿图文对上训练了一个 epoch。 在 32 x A100 上训练了大约100小时，该版本只是一个初步的版本。| [地址](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1) |
+| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1 | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 他们将 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) 数据集 (100M) 和 [Zero](https://zero.so.com/) 数据集 (23M) 用作预训练的数据集，先用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 对这两个数据集的图文对相似性进行打分，取 CLIP Score 大于 0.2 的图文对作为训练集。 他们使用 [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([论文](https://arxiv.org/abs/2112.10752)) 模型进行继续训练，其中训练分为**两个stage**。**第一个stage** 中冻住模型的其他部分，只训练 text encoder ，以便保留原始模型的生成能力且实现中文概念的对齐。**第二个stage** 中将全部模型解冻，一起训练 text encoder 和 diffusion model ，以便 diffusion model 更好的适配中文引导。第一个 stage 他们训练了 80 小时，第二个 stage 训练了 100 小时，两个stage都是用了8 x A100，该版本是一个初步的版本。| [地址](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1) |
+</details>
+### 加载HF Diffusers权重
+```python
+from ppdiffusers import StableDiffusionPipeline
+# 设置from_hf_hub为True，表示从huggingface hub下载，from_diffusers为True表示加载的是diffusers版Pytorch权重
+pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2", from_hf_hub=True, from_diffusers=True)
+```
+### 加载原库的Lightning权重
+```python
+from ppdiffusers import StableDiffusionPipeline
+# 可输入网址 或 本地ckpt、safetensors文件
+pipe = StableDiffusionPipeline.from_single_file("https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/ppdiffusers/chilloutmix_NiPrunedFp32Fix.safetensors")
+```
+### 加载HF LoRA权重
+```python
+from ppdiffusers import DiffusionPipeline
+pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", paddle_dtype=paddle.float16)
+pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0",
+    weight_name="sd_xl_offset_example-lora_1.0.safetensors",
+    from_diffusers=True)
+```
+### 加载Civitai社区的LoRA权重
+```python
+from ppdiffusers import StableDiffusionPipeline
+pipe = StableDiffusionPipeline.from_pretrained("TASUKU2023/Chilloutmix")
+# 加载lora权重
+pipe.load_lora_weights("./",
+    weight_name="Moxin_10.safetensors",
+    from_diffusers=True)
+pipe.fuse_lora()
+```
+### XFormers加速
+为了使用**XFormers加速**，我们需要安装`develop`版本的`paddle`，Linux系统的安装命令如下：
+```sh
+python -m pip install paddlepaddle-gpu==0.0.0.post117 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+```
+```python
+import paddle
+from ppdiffusers import StableDiffusionPipeline
+pipe = StableDiffusionPipeline.from_pretrained("TASUKU2023/Chilloutmix", paddle_dtype=paddle.float16)
+# 开启xformers加速 默认选择"cutlass"加速
+pipe.enable_xformers_memory_efficient_attention()
+# flash 需要使用 A100、A10、3060、3070、3080、3090 等以上显卡。
+# pipe.enable_xformers_memory_efficient_attention("flash")
+```
+### ToME + ControlNet
+```python
+# 安装develop的ppdiffusers
+# pip install "ppdiffusers>=0.24.0"
+import paddle
+from ppdiffusers import ControlNetModel, StableDiffusionControlNetPipeline
+from ppdiffusers.utils import load_image
+controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
+pipe = StableDiffusionControlNetPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5", safety_checker=None, controlnet=controlnet, paddle_dtype=paddle.float16
+)
+# Apply ToMe with a 50% merging ratio
+pipe.apply_tome(ratio=0.5) # Can also use pipe.unet in place of pipe here
+# 我们可以开启 xformers
+# pipe.enable_xformers_memory_efficient_attention()
+generator = paddle.Generator().manual_seed(0)
+prompt = "bird"
+image = load_image(
+    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png"
+)
+image = pipe(prompt, image, generator=generator).images[0]
+image.save("bird.png")
+```
+### 文图生成 （Text-to-Image Generation）
+```python
+import paddle
+from ppdiffusers import StableDiffusionPipeline
+pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2")
+# 设置随机种子，我们可以复现下面的结果！
+paddle.seed(5232132133)
+prompt = "a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing "
+image = pipe(prompt, guidance_scale=7.5, height=768, width=768).images[0]
+image.save("shiba_dog_with_a_red_cap.png")
+```
+<div align="center">
+<img width="500" alt="image" src="https://user-images.githubusercontent.com/50394665/204796701-d7911f76-8670-47d5-8d1b-8368b046c5e4.png">
+</div>
+### 文本引导的图像变换（Image-to-Image Text-Guided Generation）
+<details><summary>&emsp;Image-to-Image Text-Guided Generation Demo </summary>
+```python
+import paddle
+from ppdiffusers import StableDiffusionImg2ImgPipeline
+from ppdiffusers.utils import load_image
+pipe = StableDiffusionImg2ImgPipeline.from_pretrained("Linaqruf/anything-v3.0", safety_checker=None)
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png"
+image = load_image(url).resize((512, 768))
+# 设置随机种子，我们可以复现下面的结果！
+paddle.seed(42)
+prompt = "Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress"
+negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
+image = pipe(prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5).images[0]
+image.save("image_Kurisu_img2img.png")
+```
+<div align="center">
+<img width="500" alt="image" src="https://user-images.githubusercontent.com/50394665/204799529-cd89dcdb-eb1d-4247-91ac-b0f7bad777f8.png">
+</div>
+</details>
+### 文本引导的图像编辑（Text-Guided Image Inpainting）
+注意！当前有两种版本的图像编辑代码，一个是Legacy版本，一个是正式版本，下面将分别介绍两种代码如何使用！
+<details><summary>&emsp;Legacy版本代码</summary>
+```python
+import paddle
+from ppdiffusers import StableDiffusionInpaintPipelineLegacy
+from ppdiffusers.utils import load_image
+# 可选模型权重
+# CompVis/stable-diffusion-v1-4
+# runwayml/stable-diffusion-v1-5
+# stabilityai/stable-diffusion-2-base （原始策略 512x512）
+# stabilityai/stable-diffusion-2 （v-objective 768x768）
+# Linaqruf/anything-v3.0
+# ......
+img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"
+image = load_image(img_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))
+pipe = StableDiffusionInpaintPipelineLegacy.from_pretrained("stabilityai/stable-diffusion-2-base", safety_checker=None)
+# 设置随机种子，我们可以复现下面的结果！
+paddle.seed(10245)
+prompt = "a red cat sitting on a bench"
+image = pipe(prompt=prompt, image=image, mask_image=mask_image, strength=0.75).images[0]
+image.save("a_red_cat_legacy.png")
+```
+<div align="center">
+<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/204802186-5a6d302b-83aa-4247-a5bb-ebabfcc3abc4.png">
+</div>
+</details>
+<details><summary>&emsp;正式版本代码</summary>
+Tips: 下面的使用方法是新版本的代码，也是官���推荐的代码，注意必须配合 **runwayml/stable-diffusion-inpainting** 和 **stabilityai/stable-diffusion-2-inpainting** 才可正常使用。
+```python
+import paddle
+from ppdiffusers import StableDiffusionInpaintPipeline
+from ppdiffusers.utils import load_image
+# 可选模型权重
+# runwayml/stable-diffusion-inpainting
+# stabilityai/stable-diffusion-2-inpainting
+img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"
+image = load_image(img_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))
+pipe = StableDiffusionInpaintPipeline.from_pretrained("stabilityai/stable-diffusion-2-inpainting")
+# 设置随机种子，我们可以复现下面的结果！
+paddle.seed(1024)
+prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
+image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
+image.save("a_yellow_cat.png")
+```
+<div align="center">
+<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/204801946-6cd043bc-f3db-42cf-82cd-6a6171484523.png">
+</div>
+</details>
+### 文本引导的图像放大 & 超分（Text-Guided Image Upscaling & Super-Resolution）
+<details><summary>&emsp;Text-Guided Image Upscaling Demo</summary>
+```python
+import paddle
+from ppdiffusers import StableDiffusionUpscalePipeline
+from ppdiffusers.utils import load_image
+pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
+# 我们人工将原始图片缩小成 128x128 分辨率，最终保存的图片会放大4倍！
+low_res_img = load_image(url).resize((128, 128))
+prompt = "a white cat"
+image = pipe(prompt=prompt, image=low_res_img).images[0]
+image.save("upscaled_white_cat.png")
+```
+<div align="center">
+<img width="200" alt="image" src="https://user-images.githubusercontent.com/50394665/204806180-b7f1b9cf-8a62-4577-b5c4-91adda08a13b.png">
+<img width="400" alt="image" src="https://user-images.githubusercontent.com/50394665/204806202-8c110be3-5f48-4946-95ea-21ad5a9a2340.png">
+</div>
+</details>
+<details><summary>&emsp;Super-Resolution Demo</summary>
+```python
+import paddle
+from ppdiffusers import LDMSuperResolutionPipeline
+from ppdiffusers.utils import load_image
+pipe = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+# 我们人工将原始图片缩小成 128x128 分辨率，最终保存的图片会放大4倍！
+low_res_img = load_image(url).resize((128, 128))
+image = pipe(image=low_res_img, num_inference_steps=100).images[0]
+image.save("ldm-super-resolution-image.png")
+```
+<div align="center">
+<img width="200" alt="image" src="https://user-images.githubusercontent.com/50394665/204804426-5e28b571-aa41-4f56-ba26-68cca75fdaae.png">
+<img width="400" alt="image" src="https://user-images.githubusercontent.com/50394665/204804148-fe7c293b-6cd7-4942-ae9c-446369fe8410.png">
+</div>
+</details>
+## 模型推理部署
+除了**Paddle动态图**运行之外，很多模型还支持将模型导出并使用推理引擎运行。我们提供基于[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)上的**StableDiffusion**模型部署示例，涵盖文生图、图生图、图像编辑等任务，用户可以按照我们提供[StableDiffusion模型导出教程](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/deploy/export.md)将模型导出，然后使用`FastDeployStableDiffusionMegaPipeline`进行高性能推理部署！
+<details><summary>&emsp; 已预先导出的FastDeploy版Stable Diffusion权重 </summary>
+**注意：当前导出的vae encoder带有随机因素！**
+- CompVis/stable-diffusion-v1-4@fastdeploy
+- runwayml/stable-diffusion-v1-5@fastdeploy
+- runwayml/stable-diffusion-inpainting@fastdeploy
+- stabilityai/stable-diffusion-2-base@fastdeploy
+- stabilityai/stable-diffusion-2@fastdeploy
+- stabilityai/stable-diffusion-2-inpainting@fastdeploy
+- Linaqruf/anything-v3.0@fastdeploy
+- hakurei/waifu-diffusion-v1-3@fastdeploy
+</details>
+<details><summary>&emsp; FastDeploy Demo </summary>
+```python
+import paddle
+import fastdeploy as fd
+from ppdiffusers import FastDeployStableDiffusionMegaPipeline
+from ppdiffusers.utils import load_image
+def create_runtime_option(device_id=0, backend="paddle", use_cuda_stream=True):
+    option = fd.RuntimeOption()
+    if backend == "paddle":
+        option.use_paddle_backend()
+    else:
+        option.use_ort_backend()
+    if device_id == -1:
+        option.use_cpu()
+    else:
+        option.use_gpu(device_id)
+        if use_cuda_stream:
+            paddle_stream = paddle.device.cuda.current_stream(device_id).cuda_stream
+            option.set_external_raw_stream(paddle_stream)
+    return option
+runtime_options = {
+    "text_encoder": create_runtime_option(0, "paddle"),  # use gpu:0
+    "vae_encoder": create_runtime_option(0, "paddle"),  # use gpu:0
+    "vae_decoder": create_runtime_option(0, "paddle"),  # use gpu:0
+    "unet": create_runtime_option(0, "paddle"),  # use gpu:0
+}
+fd_pipe = FastDeployStableDiffusionMegaPipeline.from_pretrained(
+    "Linaqruf/anything-v3.0@fastdeploy", runtime_options=runtime_options
+)
+# text2img
+prompt = "a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing "
+image_text2img = fd_pipe.text2img(prompt=prompt, num_inference_steps=50).images[0]
+image_text2img.save("image_text2img.png")
+# img2img
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png"
+image = load_image(url).resize((512, 512))
+prompt = "Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress"
+negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
+image_img2img = fd_pipe.img2img(
+    prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5
+).images[0]
+image_img2img.save("image_img2img.png")
+# inpaint_legacy
+img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"
+image = load_image(img_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))
+prompt = "a red cat sitting on a bench"
+image_inpaint_legacy = fd_pipe.inpaint_legacy(
+    prompt=prompt, image=image, mask_image=mask_image, strength=0.75, num_inference_steps=50
+).images[0]
+image_inpaint_legacy.save("image_inpaint_legacy.png")
+```
+</details>
+<div align="center">
+<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/205297240-46b80992-34af-40cd-91a6-ae76589d0e21.png">
+</div>
+## 更多任务分类展示
+### 文本图像多模
+<details open>
+<summary>&emsp;文图生成（Text-to-Image Generation）</summary>
+#### text_to_image_generation-stable_diffusion
+```python
+from ppdiffusers import StableDiffusionPipeline
+# 加载模型和scheduler
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
+# 执行pipeline进行推理
+prompt = "a photo of an astronaut riding a horse on mars"
+image = pipe(prompt).images[0]
+# 保存图片
+image.save("astronaut_rides_horse_sd.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209322401-6ecfeaaa-6878-4302-b592-07a31de4e590.png">
+</div>
+#### text_to_image_generation-stable_diffusion_xl
+```python
+import paddle
+from ppdiffusers import StableDiffusionXLPipeline
+pipe = StableDiffusionXLPipeline.from_pretrained(
+     "stabilityai/stable-diffusion-xl-base-1.0",
+     paddle_dtype=paddle.float16,
+     variant="fp16"
+)
+prompt = "a photo of an astronaut riding a horse on mars"
+generator = paddle.Generator().manual_seed(42)
+image = pipe(prompt=prompt, generator=generator, num_inference_steps=50).images[0]
+image.save('sdxl_text2image.png')
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/d72729f9-8685-48f9-a238-e4ddf6d264f3">
+</div>
+#### text_to_image_generation-sdxl_base_with_refiner
+```python
+from ppdiffusers import DiffusionPipeline
+import paddle
+# load both base & refiner
+base = DiffusionPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-1.0",
+    paddle_dtype=paddle.float16,
+)
+refiner = DiffusionPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-1.0",
+    text_encoder_2=base.text_encoder_2,
+    vae=base.vae,
+    paddle_dtype=paddle.float16,
+    variant="fp16",
+)
+# Define how many steps and what % of steps to be run on each experts (80/20) here
+n_steps = 40
+high_noise_frac = 0.8
+prompt = "A majestic lion jumping from a big stone at night"
+prompt = "a photo of an astronaut riding a horse on mars"
+generator = paddle.Generator().manual_seed(42)
+# run both experts
+image = base(
+    prompt=prompt,
+    output_type="latent",
+    generator=generator,
+).images
+image = refiner(
+    prompt=prompt,
+    image=image,
+    generator=generator,
+).images[0]
+image.save('text_to_image_generation-sdxl-base-with-refiner-result.png')
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/8ef36826-ed94-4856-a356-af1677f60d1b">
+</div>
+#### text_to_image_generation-kandinsky2_2
+```python
+from ppdiffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline
+pipe_prior = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior")
+prompt = "red cat, 4k photo"
+out = pipe_prior(prompt)
+image_emb = out.image_embeds
+zero_image_emb = out.negative_image_embeds
+pipe = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder")
+image = pipe(
+    image_embeds=image_emb,
+    negative_image_embeds=zero_image_emb,
+    height=768,
+    width=768,
+    num_inference_steps=50,
+).images
+image[0].save("text_to_image_generation-kandinsky2_2-result-cat.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/188f76dd-4bd7-4a33-8f30-b893c7a9e249">
+</div>
+#### text_to_image_generation-unidiffuser
+```python
+import paddle
+from paddlenlp.trainer import set_seed
+from ppdiffusers import UniDiffuserPipeline
+model_id_or_path = "thu-ml/unidiffuser-v1"
+pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, paddle_dtype=paddle.float16)
+set_seed(42)
+# Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
+# 1. Text-to-image generation
+prompt = "an elephant under the sea"
+sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
+t2i_image = sample.images[0]
+t2i_image.save("t2i_image.png")
+````
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/a6eb11d2-ad27-4263-8cb4-b0d8dd42b36c">
+</div>
+#### text_to_image_generation-deepfloyd_if
+```python
+import paddle
+from ppdiffusers import DiffusionPipeline, IFPipeline, IFSuperResolutionPipeline
+from ppdiffusers.utils import pd_to_pil
+# Stage 1: generate images
+pipe = IFPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", paddle_dtype=paddle.float16)
+pipe.enable_xformers_memory_efficient_attention()
+prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
+prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)
+image = pipe(
+    prompt_embeds=prompt_embeds,
+    negative_prompt_embeds=negative_embeds,
+    output_type="pd",
+).images
+# save intermediate image
+pil_image = pd_to_pil(image)
+pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_I.png")
+# save gpu memory
+pipe.to(paddle_device="cpu")
+# Stage 2: super resolution stage1
+super_res_1_pipe = IFSuperResolutionPipeline.from_pretrained(
+    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", paddle_dtype=paddle.float16
+)
+super_res_1_pipe.enable_xformers_memory_efficient_attention()
+image = super_res_1_pipe(
+    image=image,
+    prompt_embeds=prompt_embeds,
+    negative_prompt_embeds=negative_embeds,
+    output_type="pd",
+).images
+# save intermediate image
+pil_image = pd_to_pil(image)
+pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_II.png")
+# save gpu memory
+super_res_1_pipe.to(paddle_device="cpu")
+```
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/246785766-700dfad9-159d-4bfb-bfc7-c18df938a052.png">
+</div>
+<div align="center">
+<center>if_stage_I</center>
+</div>
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/246785773-3359ca5f-dadf-4cc8-b318-ff1f9d4a2d35.png">
+</div>
+<div align="center">
+<center>if_stage_II</center>
+<!-- <img alt="image" src="https://user-images.githubusercontent.com/20476674/246785774-8870829a-354b-4a87-9d67-93af315f51e6.png">
+<center>if_stage_III</center> -->
+</div>
+</details>
+<details><summary>&emsp;文本引导的图像放大（Text-Guided Image Upscaling）</summary>
+#### text_guided_image_upscaling-stable_diffusion_2
+```python
+from ppdiffusers import StableDiffusionUpscalePipeline
+from ppdiffusers.utils import load_image
+pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
+low_res_img = load_image(url).resize((128, 128))
+prompt = "a white cat"
+upscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]
+upscaled_image.save("upsampled_cat_sd2.png")
+```
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/209324085-0d058b70-89b0-43c2-affe-534eedf116cf.png">
+<center>原图像</center>
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/209323862-ce2d8658-a52b-4f35-90cb-aa7d310022e7.png">
+<center>生成图像</center>
+</div>
+</details>
+<details><summary>&emsp;文本引导的图像编辑（Text-Guided Image Inpainting）</summary>
+#### text_guided_image_inpainting-stable_diffusion_2
+```python
+import paddle
+from ppdiffusers import PaintByExamplePipeline
+from ppdiffusers.utils import load_image
+img_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/image_example_1.png"
+mask_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/mask_example_1.png"
+example_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/reference_example_1.jpeg"
+init_image = load_image(img_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))
+example_image = load_image(example_url).resize((512, 512))
+pipe = PaintByExamplePipeline.from_pretrained("Fantasy-Studio/Paint-by-Example")
+# 使用fp16加快生成速度
+with paddle.amp.auto_cast(True):
+    image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]
+image.save("image_guided_image_inpainting-paint_by_example-result.png")
+```
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118364-5d91f433-f9ac-4514-b5f0-cb4599905847.png" width=300>
+<center>原图像</center>
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118361-0f78d6db-6896-4f8d-b1bd-8350192f7a4e.png" width=300>
+<center>掩码图像</center>
+<div align="center">
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118368-305a048d-ddc3-4a5f-8915-58591ef680f0.jpeg" width=300>
+<center>参考图像</center>
+<img alt="image" src="https://user-images.githubusercontent.com/20476674/247117963-e5b9b754-39a3-480b-a557-46a2f9310e79.png" width=300>
+<center>生成图像</center>
+</div>
+</details>
+<details><summary>&emsp;文本引导的图像变换（Image-to-Image Text-Guided Generation）</summary>
+#### text_guided_image_inpainting-kandinsky2_2
+```python
+import numpy as np
+import paddle
+from ppdiffusers import KandinskyV22InpaintPipeline, KandinskyV22PriorPipeline
+from ppdiffusers.utils import load_image
+pipe_prior = KandinskyV22PriorPipeline.from_pretrained(
+    "kandinsky-community/kandinsky-2-2-prior", paddle_dtype=paddle.float16
+)
+prompt = "a hat"
+image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)
+pipe = KandinskyV22InpaintPipeline.from_pretrained(
+    "kandinsky-community/kandinsky-2-2-decoder-inpaint", paddle_dtype=paddle.float16
+)
+init_image = load_image(
+    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png"
+)
+mask = np.zeros((768, 768), dtype=np.float32)
+mask[:250, 250:-250] = 1
+out = pipe(
+    image=init_image,
+    mask_image=mask,
+    image_embeds=image_emb,
+    negative_image_embeds=zero_image_emb,
+    height=768,
+    width=768,
+    num_inference_steps=50,
+)
+image = out.images[0]
+image.save("text_guided_image_inpainting-kandinsky2_2-result-cat_with_hat.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/64a943d5-167b-4433-91c3-3cf9279714db">
+<center>原图像</center>
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/f469c127-52f4-4173-a693-c06b92a052aa">
+<center>生成图像</center>
+</div>
+#### image_to_image_text_guided_generation-stable_diffusion
+```python
+import paddle
+from ppdiffusers import StableDiffusionImg2ImgPipeline
+from ppdiffusers.utils import load_image
+# 加载pipeline
+pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
+# 下载初始图片
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
+init_image = load_image(url).resize((768, 512))
+prompt = "A fantasy landscape, trending on artstation"
+# 使用fp16加快生成速度
+with paddle.amp.auto_cast(True):
+    image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
+image.save("fantasy_landscape.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209327142-d8e1d0c7-3bf8-4a08-a0e8-b11451fc84d8.png">
+<center>原图像</center>
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325799-d9ff279b-0d57-435f-bda7-763e3323be23.png">
+<center>生成图像</center>
+</div>
+#### image_to_image_text_guided_generation-stable_diffusion_xl
+```python
+import paddle
+from ppdiffusers import StableDiffusionXLImg2ImgPipeline
+from ppdiffusers.utils import load_image
+pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-1.0",
+    paddle_dtype=paddle.float16,
+    # from_hf_hub=True,
+    # from_diffusers=True,
+    variant="fp16"
+)
+url = "https://paddlenlp.bj.bcebos.com/models/community/westfish/develop-0-19-3/000000009.png"
+init_image = load_image(url).convert("RGB")
+prompt = "a photo of an astronaut riding a horse on mars"
+image = pipe(prompt, image=init_image).images[0]
+image.save('sdxl_image2image.png')
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/41bd9381-2799-4bed-a5e2-ba312a2f8da9">
+<center>原图像</center>
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/db672d03-2e3a-46ac-97fd-d80cca18dbbe">
+<center>生成图像</center>
+</div>
+#### image_to_image_text_guided_generation-kandinsky2_2
+```python
+import paddle
+from ppdiffusers import KandinskyV22Img2ImgPipeline, KandinskyV22PriorPipeline
+from ppdiffusers.utils import load_image
+pipe_prior = KandinskyV22PriorPipeline.from_pretrained(
+    "kandinsky-community/kandinsky-2-2-prior", paddle_dtype=paddle.float16
+)
+prompt = "A red cartoon frog, 4k"
+image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)
+pipe = KandinskyV22Img2ImgPipeline.from_pretrained(
+    "kandinsky-community/kandinsky-2-2-decoder", paddle_dtype=paddle.float16
+)
+init_image = load_image(
+    "https://hf-mirror.com/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/frog.png"
+)
+image = pipe(
+    image=init_image,
+    image_embeds=image_emb,
+    negative_image_embeds=zero_image_emb,
+    height=768,
+    width=768,
+    num_inference_steps=100,
+    strength=0.2,
+).images
+image[0].save("image_to_image_text_guided_generation-kandinsky2_2-result-red_frog.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/aae57109-94ad-408e-ae75-8cce650cebe5">
+<center>原图像</center>
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/23cf2c4e-416f-4f21-82a6-e57de11b5e83">
+<center>生成图像</center>
+</div>
+</details>
+</details>
+<details><summary>&emsp;文本图像双引导图像生成（Dual Text and Image Guided Generation）</summary>
+#### dual_text_and_image_guided_generation-versatile_diffusion
+```python
+from ppdiffusers import VersatileDiffusionDualGuidedPipeline
+from ppdiffusers.utils import load_image
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
+image = load_image(url)
+text = "a red car in the sun"
+pipe = VersatileDiffusionDualGuidedPipeline.from_pretrained("shi-labs/versatile-diffusion")
+pipe.remove_unused_weights()
+text_to_image_strength = 0.75
+image = pipe(prompt=text, image=image, text_to_image_strength=text_to_image_strength).images[0]
+image.save("versatile-diffusion-red_car.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325965-2475e9c4-a524-4970-8498-dfe10ff9cf24.jpg" >
+<center>原图像</center>
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325293-049098d0-d591-4abc-b151-9291ac2636da.png">
+<center>生成图像</center>
+</div>
+</details>
+### 文本视频多模
+<details open>
+<summary>&emsp;文本条件的视频生成（Text-to-Video Generation）</summary>
+#### text_to_video_generation-lvdm
+```python
+import paddle
+from ppdiffusers import LVDMTextToVideoPipeline
+# 加载模型和scheduler
+pipe = LVDMTextToVideoPipeline.from_pretrained("westfish/lvdm_text2video_orig_webvid_2m")
+# 执行pipeline进行推理
+seed = 2013
+generator = paddle.Generator().manual_seed(seed)
+samples = pipe(
+    prompt="cutting in kitchen",
+    num_frames=16,
+    height=256,
+    width=256,
+    num_inference_steps=50,
+    generator=generator,
+    guidance_scale=15,
+    eta=1,
+    save_dir=".",
+    save_name="text_to_video_generation-lvdm-result-ddim_lvdm_text_to_video_ucf",
+    encoder_type="2d",
+    scale_factor=0.18215,
+    shift_factor=0,
+)
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/270906907-2b9d53c1-0272-4c7a-81b2-cd962d23bbee.gif">
+</div>
+#### text_to_video_generation-synth
+```python
+import imageio
+from ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline
+pipe = TextToVideoSDPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b")
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+prompt = "An astronaut riding a horse."
+video_frames = pipe(prompt, num_inference_steps=25).frames
+imageio.mimsave("text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4", video_frames, fps=8)
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/281259277-0ebe29a3-4eba-48ee-a98b-292e60de3c98.gif">
+</div>
+#### text_to_video_generation-synth with zeroscope_v2_XL
+```python
+import imageio
+from ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline
+# from ppdiffusers.utils import export_to_video
+pipe = TextToVideoSDPipeline.from_pretrained("cerspense/zeroscope_v2_XL")
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+prompt = "An astronaut riding a horse."
+video_frames = pipe(prompt, num_inference_steps=50, height=320, width=576, num_frames=24).frames
+imageio.mimsave("text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4", video_frames, fps=8)
+```
+<div align="center">
+<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/43ebbca0-9f07-458b-809a-acf296a2539b">
+</div>
+#### text_to_video_generation-zero
+```python
+import imageio
+# pip install imageio[ffmpeg]
+import paddle
+from ppdiffusers import TextToVideoZeroPipeline
+model_id = "runwayml/stable-diffusion-v1-5"
+pipe = TextToVideoZeroPipeline.from_pretrained(model_id, paddle_dtype=paddle.float16)
+prompt = "A panda is playing guitar on times square"
+result = pipe(prompt=prompt).images
+result = [(r * 255).astype("uint8") for r in result]
+imageio.mimsave("text_to_video_generation-zero-result-panda.mp4", result, fps=4)
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/246779321-c2b0c2b4-e383-40c7-a4d8-f417e8062b35.gif">
+</div>
+</details>
+### 文本音频多模
+<details>
+<summary>&emsp;文本条件的音频生成（Text-to-Audio Generation）</summary>
+#### text_to_audio_generation-audio_ldm
+```python
+import paddle
+import scipy
+from ppdiffusers import AudioLDM2Pipeline
+pipe = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2", paddle_dtype=paddle.float16)
+prompt = "Musical constellations twinkling in the night sky, forming a cosmic melody."
+negative_prompt = "Low quality."
+audio = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=200, audio_length_in_s=10).audios[0]
+output_path = f"{prompt}.wav"
+# save the audio sample as a .wav file
+scipy.io.wavfile.write(output_path, rate=16000, data=audio)
+```
+<div align = "center">
+  <thead>
+  </thead>
+  <tbody>
+   <tr>
+      <td align = "center">
+      <a href="https://paddlenlp.bj.bcebos.com/models/community/paddlemix/ppdiffusers/AudioLDM2-Music.wav" rel="nofollow">
+            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+  </tbody>
+</div>
+</details>
+可以使用以下代码转换[huggingface](https://huggingface.co/docs/diffusers/api/pipelines/audioldm2)的模型，一键在paddle中使用
+```python
+pipe = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2-music", from_hf_hub=True, from_diffusers=True).save_pretrained("cvssp/audioldm2-music")
+```
+### 图像
+<details><summary>&emsp;无条件图像生成（Unconditional Image Generation）</summary>
+#### unconditional_image_generation-latent_diffusion_uncond
+```python
+from ppdiffusers import LDMPipeline
+# 加载模型和scheduler
+pipe = LDMPipeline.from_pretrained("CompVis/ldm-celebahq-256")
+# 执行pipeline进行推理
+image = pipe(num_inference_steps=200).images[0]
+# 保存图片
+image.save("ldm_generated_image.png")
+```
+<div align="center">
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209327936-7fe914e0-0ea0-4e21-a433-24eaed6ee94c.png">
+</div>
+</details>
+<details><summary>&emsp;超分（Super Superresolution）</summary>
+#### super_resolution-latent_diffusion
+```python
+import paddle
+from ppdiffusers import LDMSuperResolutionPipeline
+from ppdiffusers.utils import load_image
+# 加载pipeline
+pipe = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")
+# 下载初始图片
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
+init_image = load_image(url).resize((128, 128))
+init_image.save("original-image.png")
+# 使用fp16加快生成速度
+with paddle.amp.auto_cast(True):
+    image = pipe(init_image, num_inference_steps=100, eta=1).images[0]
+image.save("super-resolution-image.png")
+```
+<div align="center">
+<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209328660-9700fdc3-72b3-43bd-9a00-23b370ba030b.png">
+<center>原图像</center>
+<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209328479-4eaea5d8-aa4a-4f31-aa2a-b47e3c730f15.png">
+<center>生成图像</center>
+</div>
+</details>
+<details><summary>&emsp;图像编辑（Image Inpainting）</summary>
+#### image_inpainting-repaint
+```python
+from ppdiffusers import RePaintPipeline, RePaintScheduler
+from ppdiffusers.utils import load_image
+img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/celeba_hq_256.png"
+mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/mask_256.png"
+# Load the original image and the mask as PIL images
+original_image = load_image(img_url).resize((256, 256))
+mask_image = load_image(mask_url).resize((256, 256))
+scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256", subfolder="scheduler")
+pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
+output = pipe(
+    original_image=original_image,
+    mask_image=mask_image,
+    num_inference_steps=250,
+    eta=0.0,
+    jump_length=10,
+    jump_n_sample=10,
+)
+inpainted_image = output.images[0]
+inpainted_image.save("repaint-image.png")
+```
+<div align="center">
+<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329052-b6fc2aaf-1a59-49a3-92ef-60180fdffd81.png">
+<center>原图像</center>
+<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329048-4fe12176-32a0-4800-98f2-49bd8d593799.png">
+<center>mask图像</center>
+<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329241-b7e4d99e-468a-4b95-8829-d77ee14bfe98.png">
+<center>生成图像</center>
+</div>
+</details>
+<details><summary>&emsp;图像变化（Image Variation）</summary>
+#### image_variation-versatile_diffusion
+```python
+from ppdiffusers import VersatileDiffusionImageVariationPipeline
+from ppdiffusers.utils import load_image
+url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
+image = load_image(url)
+pipe = VersatileDiffusionImageVariationPipeline.from_pretrained("shi-labs/versatile-diffusion")
+image = pipe(image).images[0]
+image.save("versatile-diffusion-car_variation.png")
+```
+<div align="center">
+<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209331434-51f6cdbd-b8e4-4faa-8e49-1cc852e35603.jpg">
+<center>原图像</center>
+<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209331591-f6cc4cd8-8430-4627-8d22-bf404fb2bfdd.png">
+<center>生成图像</center>
+</div>
+</details>
+### 音频
+<details>
+<summary>&emsp;无条件音频生成（Unconditional Audio Generation）</summary>
+#### unconditional_audio_generation-audio_diffusion
+```python
+from scipy.io.wavfile import write
+from ppdiffusers import AudioDiffusionPipeline
+import paddle
+# 加载模型和scheduler
+pipe = AudioDiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256")
+pipe.set_progress_bar_config(disable=None)
+generator = paddle.Generator().manual_seed(42)
+output = pipe(generator=generator)
+audio = output.audios[0]
+image = output.images[0]
+# 保存音频到本地
+for i, audio in enumerate(audio):
+    write(f"audio_diffusion_test{i}.wav", pipe.mel.config.sample_rate, audio.transpose())
+# 保存图片
+image.save("audio_diffusion_test.png")
+```
+<div align = "center">
+  <thead>
+  </thead>
+  <tbody>
+   <tr>
+      <td align = "center">
+      <a href="https://paddlenlp.bj.bcebos.com/models/community/teticio/data/audio_diffusion_test0.wav" rel="nofollow">
+            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+  </tbody>
+</div>
+<div align="center">
+<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209342125-93e8715e-895b-4115-9e1e-e65c6c2cd95a.png">
+</div>
+#### unconditional_audio_generation-spectrogram_diffusion
+```python
+import paddle
+import scipy
+from ppdiffusers import MidiProcessor, SpectrogramDiffusionPipeline
+from ppdiffusers.utils.download_utils import ppdiffusers_url_download
+# Download MIDI from: wget https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid
+mid_file_path = ppdiffusers_url_download(
+    "https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid", cache_dir="."
+)
+pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion", paddle_dtype=paddle.float16)
+processor = MidiProcessor()
+output = pipe(processor(mid_file_path))
+audio = output.audios[0]
+output_path = "unconditional_audio_generation-spectrogram_diffusion-result-beethoven_hammerklavier_2.wav"
+# save the audio sample as a .wav file
+scipy.io.wavfile.write(output_path, rate=16000, data=audio)
+```
+<div align = "center">
+  <thead>
+  </thead>
+  <tbody>
+   <tr>
+      <td align = "center">
+      <a href="https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/beethoven_hammerklavier_2.wav" rel="nofollow">
+            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+  </tbody>
+</div>
+</details>
+## License
+PPDiffusers 遵循 [Apache-2.0开源协议](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE)。
+Stable Diffusion 遵循 [The CreativeML OpenRAIL M 开源协议](https://huggingface.co/spaces/CompVis/stable-diffusion-license)。
+> The CreativeML OpenRAIL M is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which this license is based.
+Stable Diffusion 3遵循 [Stability Community 开源协议](https://stability.ai/license)。
+> Community License: Free for research, non-commercial, and commercial use for organisations or individuals with less than $1M annual revenue. You only need a paid Enterprise license if your yearly revenues exceed USD$1M and you use Stability AI models in commercial products or services. Read more: https://stability.ai/license
+## Acknowledge
+我们借鉴了🤗 Hugging Face的[Diffusers](https://github.com/huggingface/diffusers)关于预训练扩散模型使用的优秀设计，在此对Hugging Face作者及其开源社区表示感谢。
+## Citation
+```bibtex
+@misc{ppdiffusers,
+  author = {PaddlePaddle Authors},
+  title = {PPDiffusers: State-of-the-art diffusion model toolkit based on PaddlePaddle},
+  year = {2022},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers}}
+}
+```

PaddleMIX/ppdiffusers/VERSION ADDED Viewed

	@@ -0,0 +1 @@


1	+ 0.29.0

PaddleMIX/ppdiffusers/requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+paddlenlp>=3.0.0b2
+safetensors>=0.3.1
+ftfy
+regex
+Pillow
+opencv-python
+av
+# for test
+parameterized
+requests_mock
+omegaconf
+note_seq
+urllib3<=2.0.0
+einops>=0.6.1
+paddlesde
+ligo-segments
+huggingface_hub==0.23.0
+hf_transfer

PaddleMIX/ppdiffusers/setup.py ADDED Viewed

	@@ -0,0 +1,71 @@

+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from setuptools import find_packages, setup
+description = "PPDiffusers: Diffusers toolbox implemented based on PaddlePaddle"
+with open("requirements.txt") as fin:
+    REQUIRED_PACKAGES = fin.read()
+def read(file: str):
+    current_dir = os.path.dirname(__file__)
+    path = os.path.join(current_dir, file)
+    with open(path, "r", encoding="utf-8") as f:
+        content = f.read().strip()
+    return content
+def read_version():
+    """read version of ppdiffusers"""
+    return read("VERSION")
+def read_readme():
+    return read("README.md")
+def read_requirements():
+    content = read("requirements.txt")
+    packages = content.split("\n")
+    return packages
+setup(
+    name="ppdiffusers",
+    packages=find_packages(),
+    version=read_version(),
+    author="PaddleMIX Team",
+    author_email="paddlemix@baidu.com",
+    description=description,
+    long_description=read_readme(),
+    long_description_content_type="text/markdown",
+    url="https://github.com/PaddlePaddle/PaddleMIX/ppdiffusers",
+    keywords=["ppdiffusers", "paddle", "paddlemix"],
+    install_requires=REQUIRED_PACKAGES,
+    python_requires=">=3.6",
+    entry_points={"console_scripts": ["ppdiffusers-cli=ppdiffusers.commands.ppdiffusers_cli:main"]},
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.6",
+        "Programming Language :: Python :: 3.7",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
+        "License :: OSI Approved :: Apache Software License",
+        "Operating System :: OS Independent",
+    ],
+    license="Apache 2.0",
+)

PaddleMIX/scripts/build_wheel.sh ADDED Viewed

	@@ -0,0 +1,136 @@

+#!/usr/bin/env bash
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#=================================================
+#                   Utils
+#=================================================
+# directory config
+DIST_DIR="dist"
+BUILD_DIR="build"
+EGG_DIR="paddlemix.egg-info"
+# command line log config
+RED='\033[0;31m'
+BLUE='\033[0;34m'
+GREEN='\033[1;32m'
+BOLD='\033[1m'
+NONE='\033[0m'
+function python_version_check() {
+  PY_MAIN_VERSION=`python -V 2>&1 | awk '{print $2}' | awk -F '.' '{print $1}'`
+  PY_SUB_VERSION=`python -V 2>&1 | awk '{print $2}' | awk -F '.' '{print $2}'`
+  echo -e "find python version ${PY_MAIN_VERSION}.${PY_SUB_VERSION}"
+  if [ $PY_MAIN_VERSION -ne "3" -o $PY_SUB_VERSION -lt "5" ]; then
+    echo -e "${RED}FAIL:${NONE} please use Python >= 3.5 !"
+    exit 1
+  fi
+}
+function init() {
+    echo -e "${BLUE}[init]${NONE} removing building directory..."
+    rm -rf $DIST_DIR $BUILD_DIR $EGG_DIR
+    if [ `pip list | grep paddlemix | wc -l` -gt 0  ]; then
+      echo -e "${BLUE}[init]${NONE} uninstalling paddlemix..."
+      pip uninstall -y paddlemix
+    fi
+    echo -e "${BLUE}[init]${NONE} ${GREEN}init success\n"
+}
+function build_and_install() {
+  echo -e "${BLUE}[build]${NONE} building paddlemix wheel..."
+  # add ppdiffusers as dependency to paddlemix
+  cp requirements.txt requirements.bak
+  echo 'ppdiffusers==0.19.3' >> requirements.txt
+  python setup.py sdist bdist_wheel
+  if [ $? -ne 0 ]; then
+    echo -e "${RED}[FAIL]${NONE} build paddlemix wheel failed !"
+    exit 1
+  fi
+  echo -e "${BLUE}[build]${NONE} ${GREEN}build paddldet wheel success\n"
+  mv requirements.bak requirements.txt
+  echo -e "${BLUE}[install]${NONE} installing paddlemix..."
+  cd $DIST_DIR
+  find . -name "paddlemix*.whl" | xargs pip install
+  if [ $? -ne 0 ]; then
+    cd ..
+    echo -e "${RED}[FAIL]${NONE} install paddlemix wheel failed !"
+    exit 1
+  fi
+  echo -e "${BLUE}[install]${NONE} ${GREEN}paddlemix install success\n"
+  cd ..
+}
+function unittest() {
+  echo -e "${BLUE}[unittest]${NONE} run unittests..."
+  # NOTE: perform unittests make sure installed paddlemix is used
+  python -m unittest discover -v
+  echo -e "${BLUE}[unittest]${NONE} ${GREEN}unittests success\n${NONE}"
+}
+function cleanup() {
+  rm -rf $BUILD_DIR $EGG_DIR
+  pip uninstall -y paddlemix
+}
+function abort() {
+  echo -e "${RED}[FAIL]${NONE} build wheel and unittest failed !
+          please check your code" 1>&2
+  cur_dir=`basename "$pwd"`
+  if [ cur_dir==$TEST_DIR -o cur_dir==$DIST_DIR ]; then
+    cd ..
+  fi
+  rm -rf $BUILD_DIR $EGG_DIR $DIST_DIR $TEST_DIR
+  pip uninstall -y paddlemix
+}
+python_version_check
+trap 'abort' 0
+set -e
+init
+build_and_install
+# unittest
+cleanup
+# get Paddle version
+PADDLE_VERSION=`python -c "import paddle; print(paddle.version.full_version)"`
+PADDLE_COMMIT=`python -c "import paddle; print(paddle.version.commit)"`
+PADDLE_COMMIT=`git rev-parse --short $PADDLE_COMMIT`
+# get PaddleMIX branch
+PPDET_BRANCH=`git rev-parse --abbrev-ref HEAD`
+PPDET_COMMIT=`git rev-parse --short HEAD`
+# get Python version
+PYTHON_VERSION=`python -c "import platform; print(platform.python_version())"`
+echo -e "\n${GREEN}paddlemix wheel compiled and checked success !${NONE}
+        ${BLUE}Python version:${NONE} $PYTHON_VERSION
+        ${BLUE}Paddle version:${NONE} $PADDLE_VERSION ($PADDLE_COMMIT)
+        ${BLUE}paddlemix branch:${NONE} $PPDET_BRANCH ($PPDET_COMMIT)\n"
+echo -e "${GREEN}wheel saved under${NONE} ${RED}${BOLD}./dist"
+trap : 0

a_main_folder/lavis_examples/albef_feature_extraction.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/albef_vqa.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/albef_zero_shot_classification.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip2_feature_extraction.ipynb ADDED Viewed

	@@ -0,0 +1,145 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from PIL import Image\n",
+    "\n",
+    "from lavis.models import load_model_and_preprocess"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Load an example image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "raw_image = Image.open(\"../docs/_static/merlion.png\").convert(\"RGB\")\n",
+    "caption = \"a large fountain spewing water into the air\"\n",
+    "\n",
+    "display(raw_image.resize((596, 437)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# setup device to use\n",
+    "device = torch.device(\"cuda\") if torch.cuda.is_available() else \"cpu\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model, vis_processors, txt_processors = load_model_and_preprocess(name=\"blip2_feature_extractor\", model_type=\"pretrain\", is_eval=True, device=device)\n",
+    "image = vis_processors[\"eval\"](raw_image).unsqueeze(0).to(device)\n",
+    "text_input = txt_processors[\"eval\"](caption)\n",
+    "sample = {\"image\": image, \"text_input\": [text_input]}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Multimodal features"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "features_multimodal = model.extract_features(sample)\n",
+    "print(features_multimodal.multimodal_embeds.shape)\n",
+    "# torch.Size([1, 32, 768]), 32 is the number of queries"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Unimodal features"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "features_image = model.extract_features(sample, mode=\"image\")\n",
+    "features_text = model.extract_features(sample, mode=\"text\")\n",
+    "print(features_image.image_embeds.shape)\n",
+    "# torch.Size([1, 32, 768])\n",
+    "print(features_text.text_embeds.shape)\n",
+    "# torch.Size([1, 12, 768])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Normalized low-dimensional unimodal features"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# low-dimensional projected features\n",
+    "print(features_image.image_embeds_proj.shape)\n",
+    "# torch.Size([1, 32, 256])\n",
+    "print(features_text.text_embeds_proj.shape)\n",
+    "# torch.Size([1, 12, 256])\n",
+    "similarity = (features_image.image_embeds_proj @ features_text.text_embeds_proj[:,0,:].t()).max()\n",
+    "print(similarity)\n",
+    "# tensor([[0.3642]])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_main_folder/lavis_examples/blip2_image_text_matching.ipynb ADDED Viewed

	@@ -0,0 +1,141 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from PIL import Image\n",
+    "\n",
+    "from lavis.models import load_model_and_preprocess\n",
+    "from lavis.processors import load_processor"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Load an example image and text"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "raw_image = Image.open(\"../docs/_static/merlion.png\").convert(\"RGB\")\n",
+    "display(raw_image.resize((596, 437)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# setup device to use\n",
+    "device = torch.device(\"cuda\") if torch.cuda.is_available() else \"cpu\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "caption = \"merlion in Singapore\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Load model and preprocessors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model, vis_processors, text_processors = load_model_and_preprocess(\"blip2_image_text_matching\", \"pretrain\", device=device, is_eval=True)\n",
+    "# model, vis_processors, text_processors = load_model_and_preprocess(\"blip2_image_text_matching\", \"coco\", device=device, is_eval=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Preprocess image and text inputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "img = vis_processors[\"eval\"](raw_image).unsqueeze(0).to(device)\n",
+    "txt = text_processors[\"eval\"](caption)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Compute image-text matching (ITM) score"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "itm_output = model({\"image\": img, \"text_input\": txt}, match_head=\"itm\")\n",
+    "itm_scores = torch.nn.functional.softmax(itm_output, dim=1)\n",
+    "print(f'The image and text are matched with a probability of {itm_scores[:, 1].item():.3%}')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "itc_score = model({\"image\": img, \"text_input\": txt}, match_head='itc')\n",
+    "print('The image feature and text feature has a cosine similarity of %.4f'%itc_score)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_main_folder/lavis_examples/blip2_instructed_generation.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_feature_extraction.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_image_captioning.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_image_text_matching.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_text_localization.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_vqa.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/blip_zero_shot_classification.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/clip_feature_extraction.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/lavis_examples/clip_zero_shot_classification.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_main_folder/litserve/.lightning_studio/.studiorc ADDED Viewed

	@@ -0,0 +1,4 @@

+# This script is only for your user and runs in every shell you open.
+# Use it to personalize your shell.
+#
+# Example: export MY_KEY=abcd-1234

a_main_folder/litserve/.lightning_studio/on_start.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+#!/bin/bash
+# This script runs every time your Studio starts, from your home directory.
+# List files under fast_load that need to load quickly on start (e.g. model checkpoints).
+#
+# ! fast_load
+# <your file here>
+# Add your startup commands below.
+#
+# Example: streamlit run my_app.py
+# Example: gradio my_app.py

a_main_folder/litserve/.lightning_studio/on_stop.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/bin/bash
+# This script runs every time your Studio sleeps, from your home directory.
+# Add your shutdown commands below.
+#
+# Example: docker down my-container
+# Example: sudo service mysql stop

a_main_folder/litserve/aurasr.ipynb ADDED Viewed

	@@ -0,0 +1,215 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\n",
+      "INFO:     Started server process [1819673]\n",
+      "INFO:     Started server process [1819665]\n",
+      "INFO:     Waiting for application startup.\n",
+      "INFO:     Started server process [1819688]\n",
+      "INFO:     Waiting for application startup.\n",
+      "INFO:     Application startup complete.\n",
+      "INFO:     Waiting for application startup.\n",
+      "INFO:     Application startup complete.\n",
+      "INFO:     Application startup complete.\n",
+      "INFO:     Started server process [1819696]\n",
+      "INFO:     Waiting for application startup.\n",
+      "INFO:     Application startup complete.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Swagger UI is available at http://0.0.0.0:8000/docs\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Traceback (most recent call last):\n",
+      "  File \"<string>\", line 1, in <module>\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 116, in spawn_main\n",
+      "    exitcode = _main(fd, parent_sentinel)\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 126, in _main\n",
+      "    self = reduction.pickle.load(from_parent)\n",
+      "AttributeError: Can't get attribute 'AuraSRLitAPI' on <module '__main__' (built-in)>\n",
+      "Traceback (most recent call last):\n",
+      "  File \"<string>\", line 1, in <module>\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 116, in spawn_main\n",
+      "    exitcode = _main(fd, parent_sentinel)\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 126, in _main\n",
+      "    self = reduction.pickle.load(from_parent)\n",
+      "AttributeError: Can't get attribute 'AuraSRLitAPI' on <module '__main__' (built-in)>\n",
+      "Traceback (most recent call last):\n",
+      "  File \"<string>\", line 1, in <module>\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 116, in spawn_main\n",
+      "    exitcode = _main(fd, parent_sentinel)\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 126, in _main\n",
+      "    self = reduction.pickle.load(from_parent)\n",
+      "AttributeError: Can't get attribute 'AuraSRLitAPI' on <module '__main__' (built-in)>\n",
+      "Traceback (most recent call last):\n",
+      "  File \"<string>\", line 1, in <module>\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 116, in spawn_main\n",
+      "    exitcode = _main(fd, parent_sentinel)\n",
+      "  File \"/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/spawn.py\", line 126, in _main\n",
+      "    self = reduction.pickle.load(from_parent)\n",
+      "AttributeError: Can't get attribute 'AuraSRLitAPI' on <module '__main__' (built-in)>\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Shutting down LitServe\n"
+     ]
+    },
+    {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[1], line 37\u001b[0m\n\u001b[1;32m     35\u001b[0m api \u001b[38;5;241m=\u001b[39m AuraSRLitAPI()\n\u001b[1;32m     36\u001b[0m server \u001b[38;5;241m=\u001b[39m ls\u001b[38;5;241m.\u001b[39mLitServer(api, timeout\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 37\u001b[0m \u001b[43mserver\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[43mport\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m8000\u001b[39;49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages/litserve/server.py:488\u001b[0m, in \u001b[0;36mLitServer.run\u001b[0;34m(self, host, port, num_api_servers, log_level, generate_client_file, api_server_worker_type, **kwargs)\u001b[0m\n\u001b[1;32m    486\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mSwagger UI is available at http://0.0.0.0:\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mport\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m/docs\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m    487\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m s \u001b[38;5;129;01min\u001b[39;00m servers:\n\u001b[0;32m--> 488\u001b[0m         \u001b[43ms\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mjoin\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    489\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m    490\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mShutting down LitServe\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/process.py:149\u001b[0m, in \u001b[0;36mBaseProcess.join\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    147\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_parent_pid \u001b[38;5;241m==\u001b[39m os\u001b[38;5;241m.\u001b[39mgetpid(), \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcan only join a child process\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[1;32m    148\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_popen \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcan only join a started process\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[0;32m--> 149\u001b[0m res \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_popen\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwait\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    150\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m res \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m    151\u001b[0m     _children\u001b[38;5;241m.\u001b[39mdiscard(\u001b[38;5;28mself\u001b[39m)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/popen_fork.py:43\u001b[0m, in \u001b[0;36mPopen.wait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m     41\u001b[0m             \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m     42\u001b[0m     \u001b[38;5;66;03m# This shouldn't block if wait() returned successfully.\u001b[39;00m\n\u001b[0;32m---> 43\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpoll\u001b[49m\u001b[43m(\u001b[49m\u001b[43mos\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mWNOHANG\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m==\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m0.0\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m     44\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mreturncode\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/multiprocessing/popen_fork.py:27\u001b[0m, in \u001b[0;36mPopen.poll\u001b[0;34m(self, flag)\u001b[0m\n\u001b[1;32m     25\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mreturncode \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m     26\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 27\u001b[0m         pid, sts \u001b[38;5;241m=\u001b[39m \u001b[43mos\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwaitpid\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpid\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mflag\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     28\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mOSError\u001b[39;00m:\n\u001b[1;32m     29\u001b[0m         \u001b[38;5;66;03m# Child process not yet created. See #1731717\u001b[39;00m\n\u001b[1;32m     30\u001b[0m         \u001b[38;5;66;03m# e.errno == errno.ECHILD == 10\u001b[39;00m\n\u001b[1;32m     31\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "# !pip install aura_sr\n",
+    "\n",
+    "import litserve as ls\n",
+    "import torch\n",
+    "from io import BytesIO\n",
+    "\n",
+    "\n",
+    "from PIL import Image\n",
+    "from fastapi import Response\n",
+    "from aura_sr import AuraSR\n",
+    "\n",
+    "class AuraSRLitAPI(ls.LitAPI):\n",
+    "    def setup(self, device):\n",
+    "        # Load the model\n",
+    "        self.aura_sr = AuraSR.from_pretrained(\"fal-ai/AuraSR\")\n",
+    "\n",
+    "    def decode_request(self, request):\n",
+    "        # Extract file from request\n",
+    "        return request[\"content\"].file\n",
+    "\n",
+    "    def predict(self, image_data):\n",
+    "        # Generate the upscaled image\n",
+    "        image = Image.open(image_data)\n",
+    "        upscaled_image = self.aura_sr.upscale_4x(image)\n",
+    "        \n",
+    "        return upscaled_image\n",
+    "\n",
+    "    def encode_response(self, image):\n",
+    "        buffered = BytesIO()\n",
+    "        image.save(buffered, format=\"PNG\")\n",
+    "        return Response(content=buffered.getvalue(), headers={\"Content-Type\": \"image/png\"})\n",
+    "\n",
+    "# Starting the server\n",
+    "if __name__ == \"__main__\":\n",
+    "    api = AuraSRLitAPI()\n",
+    "    server = ls.LitServer(api, timeout=False)\n",
+    "    server.run(port=8000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: aura_sr in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (0.0.4)\n",
+      "Requirement already satisfied: torch>=2.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (2.5.1)\n",
+      "Requirement already satisfied: torchvision in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (0.20.1)\n",
+      "Requirement already satisfied: numpy in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (1.26.4)\n",
+      "Requirement already satisfied: einops in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (0.8.0)\n",
+      "Requirement already satisfied: huggingface-hub in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (0.27.0)\n",
+      "Requirement already satisfied: safetensors in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from aura_sr) (0.4.5)\n",
+      "Requirement already satisfied: filelock in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (3.16.1)\n",
+      "Requirement already satisfied: typing-extensions>=4.8.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (4.12.2)\n",
+      "Requirement already satisfied: networkx in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (3.4.2)\n",
+      "Requirement already satisfied: jinja2 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (3.1.5)\n",
+      "Requirement already satisfied: fsspec in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (2024.9.0)\n",
+      "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (9.1.0.70)\n",
+      "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.5.8)\n",
+      "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (11.2.1.3)\n",
+      "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (10.3.5.147)\n",
+      "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (11.6.1.9)\n",
+      "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.3.1.170)\n",
+      "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (2.21.5)\n",
+      "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (12.4.127)\n",
+      "Requirement already satisfied: triton==3.1.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (3.1.0)\n",
+      "Requirement already satisfied: sympy==1.13.1 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torch>=2.0->aura_sr) (1.13.1)\n",
+      "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from sympy==1.13.1->torch>=2.0->aura_sr) (1.3.0)\n",
+      "Requirement already satisfied: packaging>=20.9 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from huggingface-hub->aura_sr) (24.2)\n",
+      "Requirement already satisfied: pyyaml>=5.1 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from huggingface-hub->aura_sr) (6.0.2)\n",
+      "Requirement already satisfied: requests in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from huggingface-hub->aura_sr) (2.32.3)\n",
+      "Requirement already satisfied: tqdm>=4.42.1 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from huggingface-hub->aura_sr) (4.67.1)\n",
+      "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from torchvision->aura_sr) (11.0.0)\n",
+      "Requirement already satisfied: MarkupSafe>=2.0 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from jinja2->torch>=2.0->aura_sr) (3.0.2)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from requests->huggingface-hub->aura_sr) (3.4.1)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from requests->huggingface-hub->aura_sr) (3.10)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from requests->huggingface-hub->aura_sr) (2.3.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /dscilab_dungvo/workspace/bin/envs/litserve/lib/python3.10/site-packages (from requests->huggingface-hub->aura_sr) (2024.12.14)\n",
+      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.\u001b[0m\u001b[33m\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install --upgrade aura_sr"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "litserve",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_main_folder/litserve/aurasr/.lightning_studio/.studiorc ADDED Viewed

	@@ -0,0 +1,4 @@

+# This script is only for your user and runs in every shell you open.
+# Use it to personalize your shell.
+#
+# Example: export MY_KEY=abcd-1234

a_main_folder/litserve/aurasr/.lightning_studio/on_start.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+#!/bin/bash
+# This script runs every time your Studio starts, from your home directory.
+# List files under fast_load that need to load quickly on start (e.g. model checkpoints).
+#
+# ! fast_load
+# <your file here>
+# Add your startup commands below.
+#
+# Example: streamlit run my_app.py
+# Example: gradio my_app.py

a_main_folder/litserve/aurasr/.lightning_studio/on_stop.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/bin/bash
+# This script runs every time your Studio sleeps, from your home directory.
+# Add your shutdown commands below.
+#
+# Example: docker down my-container
+# Example: sudo service mysql stop

a_main_folder/litserve/aurasr/client.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import argparse
+import requests
+from datetime import datetime
+# Update this URL to your server's URL if hosted remotely
+API_URL = "http://127.0.0.1:8000/predict"
+def send_generate_request(path):
+    inputFile = open(path, 'rb')
+    inputData = inputFile.read()
+    inputFile.close()
+    response = requests.post(API_URL, files={"content": inputData})
+    if response.status_code == 200:
+        timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S").lower()
+        filename = f"output-{timestamp}.png"
+        with open(filename, "wb") as output_file:
+            output_file.write(response.content)
+        print(f"Audio saved to {filename}")
+    else:
+        print(f"Error: Response with status code {response.status_code} - {response.text}")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Send text to stable audio server and receive generated audio.")
+    parser.add_argument("--path", required=True, help="Path for the file's melody")
+    args = parser.parse_args()
+    send_generate_request(args.path)

a_main_folder/litserve/aurasr/input.jpg ADDED Viewed

a_main_folder/litserve/aurasr/server.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import litserve as ls
+import torch
+from io import BytesIO
+from PIL import Image
+from fastapi import Response
+from aura_sr import AuraSR
+class AuraSRLitAPI(ls.LitAPI):
+    def setup(self, device):
+        # Load the model
+        self.aura_sr = AuraSR.from_pretrained("fal-ai/AuraSR")
+    def decode_request(self, request):
+        # Extract file from request
+        return request["content"].file
+    def predict(self, image_data):
+        # Generate the upscaled image
+        image = Image.open(image_data)
+        upscaled_image = self.aura_sr.upscale_4x(image)
+        return upscaled_image
+    def encode_response(self, image):
+        buffered = BytesIO()
+        image.save(buffered, format="PNG")
+        return Response(content=buffered.getvalue(), headers={"Content-Type": "image/png"})
+# Starting the server
+if __name__ == "__main__":
+    api = AuraSRLitAPI()
+    server = ls.LitServer(api, timeout=False)
+    server.run(port=8000)

a_main_folder/llm2vec/test.ipynb ADDED Viewed

	@@ -0,0 +1,80 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/llm2vec/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import llm2vec"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/llm2vec/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n",
+      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+      "Downloading shards: 100%|██████████████████████| 4/4 [28:51<00:00, 432.87s/it]\n",
+      "Loading checkpoint shards: 100%|████████████████| 4/4 [00:06<00:00,  1.65s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from llm2vec import LLM2Vec\n",
+    "\n",
+    "l2v = LLM2Vec.from_pretrained(\n",
+    "    \"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp\",\n",
+    "    peft_model_name_or_path=\"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse\",\n",
+    "    device_map=\"cuda\" if torch.cuda.is_available() else \"cpu\",\n",
+    "    torch_dtype=torch.bfloat16,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "llm2vec",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_main_folder/ultralytics/input.jpg ADDED Viewed

a_main_folder/ultralytics/test.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

open_clip/src/open_clip/hf_configs.py ADDED Viewed

	@@ -0,0 +1,67 @@

+# HF architecture dict:
+arch_dict = {
+    # https://huggingface.co/docs/transformers/model_doc/roberta#roberta
+    "roberta": {
+        "config_names": {
+            "context_length": "max_position_embeddings",
+            "vocab_size": "vocab_size",
+            "width": "hidden_size",
+            "heads": "num_attention_heads",
+            "layers": "num_hidden_layers",
+            "layer_attr": "layer",
+            "token_embeddings_attr": "embeddings"
+        },
+        "pooler": "mean_pooler",
+    },
+    # https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaConfig
+    "xlm-roberta": {
+        "config_names": {
+            "context_length": "max_position_embeddings",
+            "vocab_size": "vocab_size",
+            "width": "hidden_size",
+            "heads": "num_attention_heads",
+            "layers": "num_hidden_layers",
+            "layer_attr": "layer",
+            "token_embeddings_attr": "embeddings"
+        },
+        "pooler": "mean_pooler",
+    },
+    # https://huggingface.co/docs/transformers/model_doc/mt5#mt5
+    "mt5": {
+        "config_names": {
+            # unlimited seqlen
+            # https://github.com/google-research/text-to-text-transfer-transformer/issues/273
+            # https://github.com/huggingface/transformers/blob/v4.24.0/src/transformers/models/t5/modeling_t5.py#L374
+            "context_length": "",
+            "vocab_size": "vocab_size",
+            "width": "d_model",
+            "heads": "num_heads",
+            "layers": "num_layers",
+            "layer_attr": "block",
+            "token_embeddings_attr": "embed_tokens"
+        },
+        "pooler": "mean_pooler",
+    },
+    # https://huggingface.co/docs/transformers/model_doc/bert
+    "bert": {
+        "config_names": {
+            "context_length": "max_position_embeddings",
+            "vocab_size": "vocab_size",
+            "width": "hidden_size",
+            "heads": "num_attention_heads",
+            "layers": "num_hidden_layers",
+        },
+        "pooler": "cls_pooler",
+    },
+    # https://huggingface.co/docs/transformers/model_doc/m2m_100
+    "m2m_100": {
+        "config_names": {
+            "context_length": "max_position_embeddings",
+            "vocab_size": "vocab_size",
+            "width": "d_model",
+            "heads": "encoder_attention_heads",
+            "layers": "encoder_layers",
+        },
+        "pooler": "cls_pooler",
+    },
+}

open_clip/src/open_clip/model_configs/RN101.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "embed_dim": 512,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": [
+            3,
+            4,
+            23,
+            3
+        ],
+        "width": 64,
+        "patch_size": null
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 512,
+        "heads": 8,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/RN50x16.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "embed_dim": 768,
+    "vision_cfg": {
+        "image_size": 384,
+        "layers": [
+            6,
+            8,
+            18,
+            8
+        ],
+        "width": 96,
+        "patch_size": null
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 768,
+        "heads": 12,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-16-SigLIP-i18n-256.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+    "embed_dim": 768,
+    "init_logit_bias": -10,
+    "custom_text": true,
+    "vision_cfg": {
+        "image_size": 256,
+        "timm_model_name": "vit_base_patch16_siglip_256",
+        "timm_model_pretrained": false,
+        "timm_pool": "map",
+        "timm_proj": "none"
+    },
+    "text_cfg": {
+        "context_length": 64,
+        "vocab_size": 250000,
+        "hf_tokenizer_name": "timm/ViT-B-16-SigLIP-i18n-256",
+        "tokenizer_kwargs": {
+            "clean": "canonicalize"
+        },
+        "width": 768,
+        "heads": 12,
+        "layers": 12,
+        "no_causal_mask": true,
+        "proj_bias": true,
+        "pool_type": "last",
+        "norm_kwargs":{
+            "eps": 1e-6
+        }
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-16-quickgelu.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+    "embed_dim": 512,
+    "quick_gelu": true,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": 12,
+        "width": 768,
+        "patch_size": 16
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 512,
+        "heads": 8,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-16.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "embed_dim": 512,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": 12,
+        "width": 768,
+        "patch_size": 16
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 512,
+        "heads": 8,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-32-plus-256.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "embed_dim": 640,
+    "vision_cfg": {
+        "image_size": 256,
+        "layers": 12,
+        "width": 896,
+        "patch_size": 32
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 640,
+        "heads": 10,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-32-quickgelu.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+    "embed_dim": 512,
+    "quick_gelu": true,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": 12,
+        "width": 768,
+        "patch_size": 32
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 512,
+        "heads": 8,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-B-32.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "embed_dim": 512,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": 12,
+        "width": 768,
+        "patch_size": 32
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 512,
+        "heads": 8,
+        "layers": 12
+    }
+}

open_clip/src/open_clip/model_configs/ViT-H-14-378-quickgelu.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "embed_dim": 1024,
+    "quick_gelu": true,
+    "vision_cfg": {
+        "image_size": 378,
+        "layers": 32,
+        "width": 1280,
+        "head_width": 80,
+        "patch_size": 14
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 1024,
+        "heads": 16,
+        "layers": 24
+    }
+}

open_clip/src/open_clip/model_configs/ViT-H-14-CLIPA.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+    "embed_dim": 1024,
+    "vision_cfg": {
+        "image_size": 224,
+        "layers": 32,
+        "width": 1280,
+        "head_width": 80,
+        "patch_size": 14,
+        "no_ln_pre": true,
+        "pool_type": "avg",
+        "final_ln_after_pool": true
+    },
+    "text_cfg": {
+        "context_length": 32,
+        "vocab_size": 32000,
+        "hf_tokenizer_name": "bert-base-uncased",
+        "tokenizer_kwargs": {
+            "strip_sep_token": true
+        },
+        "width": 1024,
+        "heads": 16,
+        "layers": 24,
+        "pool_type": "last",
+        "no_causal_mask": true
+    }
+}