|
|
--- |
|
|
tags: |
|
|
- summarization |
|
|
- mt5 |
|
|
- khmer |
|
|
- text2text-generation |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Khmer mT5 Summarization Model (Duplicated Text) |
|
|
|
|
|
This repository contains a fine-tuned `mT5-small` model for **Khmer text summarization** that is specially trained to collapse **duplicated or redundant content** into concise, coherent summaries. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base model:** `google/mt5-small` |
|
|
- **Fine-tuned for:** Khmer summarization with duplicate-text removal |
|
|
- **Training dataset:** `kimleang123/khmer-text-dataset-duplicated` |
|
|
- **Task:** Sequence-to-Sequence (`text2text-generation`) |
|
|
- **Evaluation:** ROUGE-1/2/L on held-out Khmer articles containing repeated passages |
|
|
|
|
|
--- |