nielsr HF Staff commited on
Commit
b89a7fb
·
verified ·
1 Parent(s): fcaa496

Add comprehensive model card for PlanGen with Diffusers integration

Browse files

This PR adds a comprehensive model card for PlanGen: Image Generation as a Visual Planner for Robotic Manipulation.

It includes:
- A link to the paper: [Image Generation as a Visual Planner for Robotic Manipulation](https://huggingface.co/papers/2512.00532).
- A link to the GitHub repository: https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation.
- **Metadata**: `license: apache-2.0`, `pipeline_tag: image-to-video`, and `library_name: diffusers` to improve discoverability and enable the automated usage widget.
- A concise summary of the model's purpose and methodology.
- The main teaser image and a results image from the GitHub repository.
- A "Quick Start" section with environment setup, requirements installation, and a sample usage code snippet from the official GitHub repository, demonstrating integration with `diffusers`.
- Details on available weights and citation information.

Please review and merge this PR if it looks good!

Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-video
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # PlanGen: Image Generation as a Visual Planner for Robotic Manipulation
8
+
9
+ This repository contains the official model for the paper:
10
+ **[Image Generation as a Visual Planner for Robotic Manipulation](https://huggingface.co/papers/2512.00532)**
11
+
12
+ PlanGen explores whether pretrained image generation models, when lightly adapted using LoRA finetuning, can serve as visual planners for robotic manipulation. The framework includes text-conditioned generation and trajectory-conditioned generation, demonstrating the ability to produce smooth, coherent robot videos aligned with respective conditions. This work indicates that pretrained image generators encode transferable temporal priors and can function as video-like robotic planners under minimal supervision.
13
+
14
+ For more details, please refer to the [official GitHub repository](https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation).
15
+
16
+ <div align="center">
17
+ <img src='https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation/raw/main/assets/Teaser.png' width='100%' />
18
+ </div>
19
+
20
+ ## Quick Start
21
+
22
+ ### Configuration
23
+ #### 1. **Environment setup**
24
+ ```bash
25
+ git clone https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation.git
26
+ cd Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation
27
+
28
+ conda create -n PlanGen python=3.11.10
29
+ conda activate PlanGen
30
+ ```
31
+ #### 2. **Requirements installation**
32
+ ```bash
33
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
34
+ pip install --upgrade -r requirements.txt
35
+ ```
36
+
37
+ ### Inference
38
+ We provided the integration of diffusers pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:
39
+
40
+ ```python
41
+ from src.pipeline_pe_clone import FluxPipeline
42
+ import torch
43
+ from PIL import Image
44
+
45
+ pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
46
+ pipeline = FluxPipeline.from_pretrained(
47
+ pretrained_model_name_or_path,
48
+ torch_dtype=torch.bfloat16,
49
+ ).to('cuda')
50
+
51
+ pipeline.load_lora_weights("yio-ye2004/lora_collection", weight_name="pretrain.safetensors")
52
+ pipeline.fuse_lora()
53
+ pipeline.unload_lora_weights()
54
+
55
+ pipeline.load_lora_weights("yio-ye2004/lora_collection", weight_name="bridge_clean_pytorch_lora_weights.safetensors")
56
+
57
+ height=768
58
+ width=512
59
+
60
+ validation_image = "assets/1.png"
61
+ validation_prompt = "add a halo and wings for the cat by sksmagiceffects"
62
+ condition_image = Image.open(validation_image).resize((height, width)).convert("RGB")
63
+
64
+ result = pipeline(prompt=validation_prompt,
65
+ condition_image=condition_image,
66
+ height=height,
67
+ width=width,
68
+ guidance_scale=3.5,
69
+ num_inference_steps=20,
70
+ max_sequence_length=512).images[0]
71
+
72
+ result.save("output.png")
73
+ ```
74
+
75
+ ## Weights
76
+ You can download the trained checkpoints of PlanGen for inference. Below are the details of available models, checkpoint name are also trigger words.
77
+
78
+ You would need to load and fuse the `pretrained ` checkpoints model in order to load the other models.
79
+
80
+ | **Model** | **Description** | **Resolution** |
81
+ | :-----------------------------------------------------------: | :--------------------------------------------------------: | :------------: |
82
+ | [pretrained](https://huggingface.co/yio-ye2004/lora_collection/blob/main/pretrained.safetensors) | Base LoRA for PlanGen | |
83
+ | [bridge_clean](https://huggingface.co/yio-ye2004/lora_collection/blob/main/bridge_clean_pytorch_lora_weights.safetensors) | LoRA trained on `bridge_clean` | |
84
+ | [bridge_traj](https://huggingface.co/yio-ye2004/lora_collection/blob/main/bridge_traj_pytorch_lora_weights.safetensors) | PlanGen LoRA trained on `bridge_traj` | |
85
+ | [jocoplay_clean](https://huggingface.co/yio-ye2004/lora_collection/blob/main/jocoplay_clean_pytorch_lora_weights.safetensors) | PlanGen LoRA trained on `jocoplay_clean` | |
86
+ | [jocoplay_traj](https://huggingface.co/yio-ye2004/lora_collection/blob/main/jocoplay_traj_pytorch_lora_weights.safetensors) | PlanGen LoRA trained on `jocoplay_traj` | |
87
+ | [rt1_clean](https://huggingface.co/yio-ye2004/lora_collection/blob/main/rt1_clean_pytorch_lora_weights.safetensors) | PlanGen LoRA trained on `rt1_clean` | |
88
+ | [rt1_traj](https://huggingface.co/yio-ye2004/lora_collection/blob/main/rt1_traj_pytorch_lora_weights.safetensors) | PlanGen LoRA trained on `rt1_traj` | |
89
+
90
+ ## Results
91
+ <div align="center">
92
+ <img src='https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation/raw/main/assets/Visual_Comparisons.png'/>
93
+ </div>
94
+
95
+ ## Citation
96
+ If you find our work useful, please cite the paper:
97
+
98
+ ```bibtex
99
+ @article{ye2025image,
100
+ title={Image Generation as a Visual Planner for Robotic Manipulation},
101
+ author={Ye, Pang},
102
+ journal={arXiv preprint arXiv:2512.00532},
103
+ year={2025}
104
+ }
105
+ ```