YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for gj5520/kkachi60_en2ko
Model Description
kkachi60_en2ko is a sequence-to-sequence Transformer model fine-tuned for English→Korean translation. It is based on the original checkpoint at step 300k (checkpoint-50003) and further adapted with a combination of dataset A and BplusD for improved fluency and adequacy.
- Architecture:
AutoModelForSeq2SeqLM(likely based on T5 or mBART architecture) - Checkpoint:
checkpoint-50003fromchkpt_prime_300k_A_BplusD - Tokenizer:
AutoTokenizermatching the model architecture
Intended Use
- Primary use case: Translating English text to Korean for research, prototyping, and evaluation tasks.
- Language pair: English (source) → Korean (target)
- License: Please refer to the Hugging Face hub license setting for usage rights.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
model_name = "gj5520/kkachi60_en2ko"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
def translate(text: str):
prefix = "translate English to Korean: "
inputs = tokenizer(prefix + text, return_tensors="pt", truncation=True)
outputs = model.generate(
**inputs,
max_new_tokens=100,
num_beams=5,
no_repeat_ngram_size=3,
early_stopping=True,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
sample = "This is a test sentence for translation."
print(translate(sample))
Evaluation Results
Performance was measured on FLORES-200 and WMT24 benchmarks using COMET, BLEU, ChrF, and spBLEU metrics.
| Benchmark | COMET (mean) | BLEU (corpus) | ChrF (corpus) | spBLEU (corpus) |
|---|---|---|---|---|
| FLORES-200 | / | / | / | / |
| WMT24 | / | / | / | / |
(Insert actual scores here after running benchmark_len.py.)
Training Data
- FLORES-200 (test split)
- WMT24++ English–Korean (en‑ko_KR config)
Training Procedure
- Fine-tuned for 300k steps on mixed data A+BplusD.
- Beam search with
num_beams=5,no_repeat_ngram_size=3. - Generation prefix:
"translate English to Korean: ".
Limitations and Biases
- May underperform on very long sentences or domain-specific jargon.
- Potential biases inherited from training corpora.
- Evaluate outputs critically, especially for sensitive content.
Citation
If you use this model in your work, please cite:
@inproceedings{kkachi60_en2ko,
title={{kkachi60\_en2ko}: English-to-Korean translation model},
author={Gj5520},
year={2025}
}
This model card was automatically generated based on the model training and evaluation pipeline.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support