## [Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution](https://arxiv.org/pdf/2508.07537) [Xiaoming Li](https://csxmli2016.github.io/), [Wangmeng Zuo](https://scholar.google.com/citations?hl=en&user=rUOpCEYAAAAJ&view_op=list_works), [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/) S-Lab, Nanyang Technological University

### 👷 The whole framework:

### 👷 Character Structure Prior Pretraining:

## 🔔 MARCONet 🆚 MARCONet++ > - MARCONet is designed for **regular character layout** only. See details of [MARCONet](https://github.com/csxmli2016/MARCONet). > - MARCONet++ has more accurate alignment between character structural prior (green structure) and the degraded image.

## 📋 TODO - [x] Release the inference code and model. - [ ] Release the training code (no plans to release for now). ## 🚶 Getting Started ``` git clone https://github.com/csxmli2016/MARCONetPlusPlus cd MARCONetPlusPlus conda create -n mplus python=3.8 -y conda activate mplus pip install -r requirements.txt ``` ## 🚶 Inference Download the pre-trained models ``` python utils/download_github.py ``` and run for restoring **text lines:** ``` CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_TextLines -a -s ``` or run for restoring **the whole text image:** ``` CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_Whole -b -s -f 2 ``` ``` # Parameters: -i: --input_path, default: ./Testsets/LR_TextLines or ./Testsets/LR_TextWhole -o: --output_path, default: None will automatically make the saving dir with the format of '[LR path]_TIME_MARCONetPlus' -a: --aligned, if the input is text lines, use -a; otherwise, the input is the whole text image and needs text line detection, do not use -a -b: --bg_sr, when restoring the whole text images, use -b to restore the background region with BSRGAN. Without -b, background will keep the same as input -f: --factor_scale, default: 2. When restoring the whole text images, use -f to define the scale factor of output -s: --save_text, if you want to see the details of prior alignment, predicted characters, and locations, use -s ``` ## 🏃 Restoring Real-world Chinese Text Images > - We use [BSRGAN](https://github.com/cszn/BSRGAN) to restore the background region. > - The parameters are tested on an NVIDIA A100 GPU (40G). > - ⚠️ If the inference speed is slow, this is caused by the large size of the input text image or the large factor_scale. You can resize it based on your needs. [

](https://imgsli.com/NDA2MDUw) [

](https://imgsli.com/NDA2MDYw) [

](https://imgsli.com/NDA2MTE0) [

](https://imgsli.com/NDA2MDYy) ## 🏃 Restoring detected text line

🏃 Style w interpolation from three characters with different styles

## ‼️ Failure Case Despite its high-fidelity performance, MARCONet++ still struggles in some real-world scenarios as it highly relies on: - Real world character **Recognition** on complex degraded text images - Real world character **Detection** on complex degraded text images - Text line detection and segmentation - Domain gap between our synthetic and real-world text images

> 🍒 Restoring complex character with high fidelity under such conditions has significant challenges. We have also explored various approaches, such as training OCR models with Transformers and using YOLO or Transformer-based methods for character detection, but these methods generally encounter the same issues. We encourage any potential collaborations to jointly tackle this challenge and advance robust, high-fidelity text restoration. ## 📎 RealCE-1K benchmark To quantitatively evaluate on real-world Chinese text line images, we curate a benchmark by filtering the [RealCE](https://github.com/mjq11302010044/Real-CE) test set to exclude images containing multiple text lines or inaccurate annotations, thereby constructing a Chinese text SR benchmark (see Section IV.B of our paper). You can download the RealCE-1K benchmark from [here](https://github.com/csxmli2016/MARCONetPlusPlus/releases/download/v1/RealCE-1K.zip). ## 🍺 Acknowledgement This project is built based on the excellent [KAIR](https://github.com/cszn/KAIR) and [RealCE](https://github.com/mjq11302010044/Real-CE). ## ©️ License This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license. ## 🍻 Citation ``` @article{li2025marconetplus, author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change}, title = {Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, year = {2025} } @InProceedings{li2023marconet, author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change}, title = {Learning Generative Structure Prior for Blind Text Image Super-resolution}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2023} } ```