File size: 2,267 Bytes

af24bed
fceefa3
af24bed
 
 
 
 
 
 
 
537545d
c63e249
af24bed
d7d90bd
af24bed
 
01d6c38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49e46c9
 
 
 
 
 
01d6c38
fceefa3

---
license: apache-2.0
datasets:
- timbrooks/instructpix2pix-clip-filtered
language:
- en
---

# Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

### CVPR 2025 (Highlight)
[Project Page](https://bolinlai.github.io/projects/InstaManip/) | [Paper](https://openaccess.thecvf.com/content/CVPR2025/papers/Lai_Unleashing_In-context_Learning_of_Autoregressive_Models_for_Few-shot_Image_Manipulation_CVPR_2025_paper.pdf) | [Code](https://github.com/BolinLai/InstaManip)

[Bolin Lai](https://bolinlai.github.io/), [Felix Juefei-Xu](https://xujuefei.com/), [Miao Liu](https://aptx4869lm.github.io/), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Nikhil Mehta](https://hockeybro12.github.io/), [Chenguang Zhu](https://cs.stanford.edu/~cgzhu/), [Zeyi Huang](https://oodbag.github.io/), [James M. Rehg](https://rehg.org/), [Sangmin Lee](https://sites.google.com/view/sangmin-lee), [Ning Zhang](https://n-zhang.github.io/), [Tong Xiao](http://xiaotong.me/)


<img src="https://bolinlai.github.io/projects/InstaManip/figures/teaser.png"/>

This repo is the model weights for our paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation". 

There are four models released in this repo.

- InstaManip-17B-1shot: model trained specifically for 1-shot image manipulation.

- InstaManip-17B-2shot: model trained specifically for 2-shot image manipulation.

- InstaManip-17B-3shot: model trained specifically for 3-shot image manipulation.

- InstaManip-17B-dynamic: model trained for arbitrary amount of exemplar image pairs.

Please refer to the code on [github](https://github.com/BolinLai/InstaManip) for detailed instructions on how to use it.

If you find our paper helpful to your work, please cite with this BibTex.

```BibTex
@inproceedings{lai2025unleashing,
  title={Unleashing in-context learning of autoregressive models for few-shot image manipulation},
  author={Lai, Bolin and Juefei-Xu, Felix and Liu, Miao and Dai, Xiaoliang and Mehta, Nikhil and Zhu, Chenguang and Huang, Zeyi and Rehg, James M and Lee, Sangmin and Zhang, Ning and Xiao, Tong},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={18346--18357},
  year={2025}
}
```