| --- |
| language: |
| - zh |
| base_model: |
| - Qwen/Qwen3-VL-2B-Instruct-GGUF |
| pipeline_tag: question-answering |
| --- |
| |
|
|
|
|
| **GEVO** is a multimodal large language model specialized for Ancient Chinese Character Evolution Analysis which has been accepted by *ACL main 2026*. The model is obtained by glyph-driven supervised fine-tuning of Qwen3-VL-2B-Instruct and is designed to enhance the understanding of ancient Chinese scripts, including oracle bone inscriptions, bronze inscriptions, seal scripts, clerical scripts, and regular scripts. |
|
|
| To facilitate further research, we have open-sourced the instruction-tuning dataset in [Github](https://github.com/songruiecho/GEVO) used for training GEVO. By following the LlamaFactory tutorials, you can easily train a model by yourself using our data. |
|
|
| GEVO is trained only on traced reproductions of ancient Chinese characters. As a result, its performance on high-noise rubbings may be suboptimal. |
|
|
| # Requirements |
|
|
| The model has been tested with the following environment: |
|
|
| ```text |
| accelerate==1.13.0 |
| huggingface_hub==1.16.1 |
| qwen-vl-utils==0.0.14 |
| torch==2.5.1 |
| torchaudio==2.5.1 |
| torchcodec==0.13.0 |
| torchvision==0.20.1 |
| transformers==5.9.0 |
| ``` |
|
|
| # Citation |
|
|
| If you find this model useful, please cite: |
|
|
| ```bibtex |
| @article{song2026gevo, |
| title={Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning}, |
| author={Song, Rui and Shi, Lida and Qi, Ruihua and Li, Yingji and Xu, Hao}, |
| journal={arXiv preprint arXiv:2604.11299}, |
| year={2026} |
| } |