metadata
license: apache-2.0
pipeline_tag: image-to-image
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
RefineAnything targets region-specific image refinement: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping all non-edited pixels unchanged. It supports both reference-based and reference-free refinement.
- Paper: 2604.06870
- Project Page: https://limuloo.github.io/RefineAnything/
- GitHub: https://github.com/limuloo/RefineAnything
- Demo: Hugging Face Space
Highlights
- Region-accurate refinement — Explicit region cues (scribbles or boxes) steer edits to the target area.
- Reference-based and reference-free — Optional reference image for guided local detail recovery.
- Strict background preservation — Edits stay inside the target region; training emphasizes seamless boundaries.
Usage
To use RefineAnything, you need an input image, a binary mask (where white indicates the region to refine), and a text prompt.
Installation
pip install -r requirement.txt
Reference-based Logo Refinement
Refine a blurry logo on a pillow using a reference image:
python scripts/fast_inference.py \
--input src/input1.png \
--mask src/mask1.png \
--prompt "Refine the LOGO." \
--ref src/ref1.png \
--output output/demo1.png
Reference-free Text Refinement
Refine blurry text on a building sign without a reference image:
python scripts/fast_inference.py \
--input src/input2.png \
--mask src/mask2.png \
--prompt "refine the text '鼎好商城'" \
--output output/demo2.png
Citation
@article{zhou2026refineanything,
title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details},
author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi},
journal={arXiv preprint arXiv:2604.06870},
year={2026}
}