Junhoee/Jeju-Standard-Translation
Viewer โข Updated โข 942k โข 13
'Jeju Satoru' is a bidirectional Jeju-Standard Korean translation model developed to preserve the Jeju language, which is designated as an 'endangered language' by UNESCO. The model aims to bridge the digital divide for elderly Jeju dialect speakers by improving their digital accessibility.
gogamza/kobart-base-v2)Our model was trained using a two-stage domain adaptation method to handle the complexities of the Jeju dialect.
[์ ์ฃผ] (Jeju) and [ํ์ค] (Standard) tags added to each sentence to explicitly guide the translation direction.The following key hyperparameters and techniques were applied for performance optimization:
The model's performance was comprehensively evaluated using both quantitative and qualitative metrics.
| Direction | SacreBLEU | CHRF | BERTScore |
|---|---|---|---|
| Jeju Dialect โ Standard | 77.19 | 83.02 | 0.97 |
| Standard โ Jeju Dialect | 64.86 | 72.68 | 0.94 |
You can easily load and infer with the model using the transformers library's pipeline function.
1. Installation
pip install transformers torch
from transformers import pipeline
# Load the model pipeline
translator = pipeline(
"translation",
model="sbaru/jeju-satoru"
)
# Example: Jeju Dialect -> Standard
jeju_sentence = '[์ ์ฃผ] ์ฐ๋ฆฌ ์ง์ด ํ์ํ๋ค.'
result = translator(jeju_sentence, max_length=128)
print(f"Input: {jeju_sentence}")
print(f"Output: {result[0]['translation_text']}")
# Example: Standard -> Jeju Dialect
standard_sentence = '[ํ์ค] ์ฐ๋ฆฌ ์ง์ ํธ์ํ๋ค.'
result = translator(standard_sentence, max_length=128)
print(f"Input: {standard_sentence}")
print(f"Output: {result[0]['translation_text']}")
Base model
gogamza/kobart-base-v2