## [Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution](https://arxiv.org/pdf/2508.07537)
[Xiaoming Li](https://csxmli2016.github.io/), [Wangmeng Zuo](https://scholar.google.com/citations?hl=en&user=rUOpCEYAAAAJ&view_op=list_works), [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/)
S-Lab, Nanyang Technological University
### đˇ The whole framework:
### đˇ Character Structure Prior Pretraining:
## đ MARCONet đ MARCONet++
> - MARCONet is designed for **regular character layout** only. See details of [MARCONet](https://github.com/csxmli2016/MARCONet).
> - MARCONet++ has more accurate alignment between character structural prior (green structure) and the degraded image.
## đ TODO
- [x] Release the inference code and model.
- [ ] Release the training code (no plans to release for now).
## đļ Getting Started
```
git clone https://github.com/csxmli2016/MARCONetPlusPlus
cd MARCONetPlusPlus
conda create -n mplus python=3.8 -y
conda activate mplus
pip install -r requirements.txt
```
## đļ Inference
Download the pre-trained models
```
python utils/download_github.py
```
and run for restoring **text lines:**
```
CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_TextLines -a -s
```
or run for restoring **the whole text image:**
```
CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_Whole -b -s -f 2
```
```
# Parameters:
-i: --input_path, default: ./Testsets/LR_TextLines or ./Testsets/LR_TextWhole
-o: --output_path, default: None will automatically make the saving dir with the format of '[LR path]_TIME_MARCONetPlus'
-a: --aligned, if the input is text lines, use -a; otherwise, the input is the whole text image and needs text line detection, do not use -a
-b: --bg_sr, when restoring the whole text images, use -b to restore the background region with BSRGAN. Without -b, background will keep the same as input
-f: --factor_scale, default: 2. When restoring the whole text images, use -f to define the scale factor of output
-s: --save_text, if you want to see the details of prior alignment, predicted characters, and locations, use -s
```
## đ Restoring Real-world Chinese Text Images
> - We use [BSRGAN](https://github.com/cszn/BSRGAN) to restore the background region.
> - The parameters are tested on an NVIDIA A100 GPU (40G).
> - â ī¸ If the inference speed is slow, this is caused by the large size of the input text image or the large factor_scale. You can resize it based on your needs.
[
](https://imgsli.com/NDA2MDUw) [
](https://imgsli.com/NDA2MDYw)
[
](https://imgsli.com/NDA2MTE0) [
](https://imgsli.com/NDA2MDYy)
## đ Restoring detected text line
đ Style w interpolation from three characters with different styles
## âŧī¸ Failure Case
Despite its high-fidelity performance, MARCONet++ still struggles in some real-world scenarios as it highly relies on:
- Real world character **Recognition** on complex degraded text images
- Real world character **Detection** on complex degraded text images
- Text line detection and segmentation
- Domain gap between our synthetic and real-world text images
> đ Restoring complex character with high fidelity under such conditions has significant challenges.
We have also explored various approaches, such as training OCR models with Transformers and using YOLO or Transformer-based methods for character detection, but these methods generally encounter the same issues.
We encourage any potential collaborations to jointly tackle this challenge and advance robust, high-fidelity text restoration.
## đ RealCE-1K benchmark
To quantitatively evaluate on real-world Chinese text line images, we curate a benchmark by filtering the [RealCE](https://github.com/mjq11302010044/Real-CE) test set to exclude images containing multiple text lines or inaccurate annotations, thereby constructing a Chinese text SR benchmark (see Section IV.B of our paper). You can download the RealCE-1K benchmark from [here](https://github.com/csxmli2016/MARCONetPlusPlus/releases/download/v1/RealCE-1K.zip).
## đē Acknowledgement
This project is built based on the excellent [KAIR](https://github.com/cszn/KAIR) and [RealCE](https://github.com/mjq11302010044/Real-CE).
## ÂŠī¸ License
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
## đģ Citation
```
@article{li2025marconetplus,
author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
title = {Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2025}
}
@InProceedings{li2023marconet,
author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
title = {Learning Generative Structure Prior for Blind Text Image Super-resolution},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2023}
}
```