Papers
arxiv:2604.23948

KOMBO: Korean Character Representations Based on the Combination Rules of Subcharacters

Published on Apr 27
Authors:
,
,

Abstract

A novel Korean language model framework named KOMBO is presented that incorporates the historical invention principles of Hangeul character representation, demonstrating superior performance on Korean natural language understanding tasks compared to existing state-of-the-art models.

The Korean writing system, Hangeul, has a unique character representation rigidly following the invention principles recorded in Hunminjeongeum.\textit{Hunminjeongeum is a book published in 1446 that describes the principles of invention and usage of Hangeul, devised by King Sejong Hunminjeongeum_Guide.} However, existing pre-trained language models (PLMs) for Korean have overlooked these principles. In this paper, we introduce a novel framework for Korean PLMs called KOMBO, which firstly brings the invention principles of Hangeul to represent character. Our proposed method, KOMBO, exhibits notable experimental proficiency across diverse NLP tasks. In particular, our method outperforms the state-of-the-art Korean PLM by an average of 2.11\% in five Korean natural language understanding tasks. Furthermore, extensive experiments demonstrate that our proposed method is suitable for comprehending the linguistic features of the Korean language. Consequently, we shed light on the superiority of using subcharacters over the typical subword-based approach for Korean PLMs. Our code is available at: [https://github.com/SungHo3268/KOMBO](https://github.com/SungHo3268/KOMBO).

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.23948
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.23948 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.23948 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.23948 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.