File size: 1,554 Bytes

6ab02cd
 
 
3ee3f20
 
 
 
 
6ab02cd
3ee3f20
6ab02cd
 
 
 
 
 
 
 
 
 
3ee3f20
6ab02cd
 
 
 
 
3ee3f20
6ab02cd
 
 
 
 
 
3ee3f20

---
base_model:
- Qwen/Qwen3-VL-4B-Instruct
language:
- en
license: mit
pipeline_tag: image-text-to-text
library_name: transformers
---

# HieroSA (Chinese)

[Paper](https://arxiv.org/abs/2601.05508) | [GitHub](https://github.com/THUNLP-MT/HieroSA)

We propose **HieroSA (Hieroglyph Stroke Analyzer)** 🏺, a framework for capturing stroke-level structural representations of hieroglyphic and logographic scripts. It automatically converts characters into normalized stroke-segment representations ✍️, without relying on handcrafted rules or script-specific priors.

HieroSA supports both modern logographic scripts and ancient hieroglyphs 🌍, enabling cross-lingual structural generalization. Experimental results demonstrate that it effectively captures character-level structure and semantics 🧩, providing a solid foundation for downstream analysis and understanding of hieroglyphic writing systems.

## More Details

Please refer to our [GitHub Repository](https://github.com/THUNLP-MT/HieroSA) for more details about this model, including environment setup and inference scripts.

## Citation

If you find our work helpful for your research, please consider citing our work.

```bibtex
@article{luo2026hierosa,
    title={Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors}, 
    author={Fuwen Luo and Zihao Wan and Ziyue Wang and Yaluo Liu and Pau Tong Lin Xu and Xuanjia Qiao and Xiaolong Wang and Peng Li and Yang Liu},
    journal={arXiv preprint arXiv:2601.05508},
    year={2026}
}
```