Romanizing system of dataset

by Comet0322 - opened Oct 24, 2024

Oct 24, 2024

Hello, I am curious about which Romanization system is used for Manchu in your dataset. I use the Möllendorff system, but I found that characters like ū, š, and ž cannot be tokenized properly.

seemdog

Owner Oct 28, 2024

Abkai Latin transliteration was used. Please refer to our paper for more details.
https://arxiv.org/pdf/2311.17492

Comet0322

Oct 31, 2024

Thank you. I will check it out.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment