Update README.md
Browse files
README.md
CHANGED
|
@@ -13,14 +13,15 @@ tags:
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Overview
|
| 16 |
-
This model is designed for the **abstractive proposition segmentation task** in Korean, as described in the paper [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf). The model segments text into atomic and self-contained units (atomic facts).
|
| 17 |
|
| 18 |
# Training Details
|
| 19 |
-
- Base Model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
|
| 20 |
-
-
|
| 21 |
-
- Dataset: [RoSE](https://huggingface.co/datasets/Salesforce/rose)
|
| 22 |
-
- The dataset was
|
| 23 |
-
|
|
|
|
| 24 |
|
| 25 |
# Usage
|
| 26 |
## Data Preprocessing
|
|
@@ -120,12 +121,12 @@ print(results)
|
|
| 120 |
</details>
|
| 121 |
|
| 122 |
## Inputs and Outputs
|
| 123 |
-
- Input: Text.
|
| 124 |
-
- Output: List of propositions for all the sentences in the text passage. The propositions for each sentence are grouped separately.
|
| 125 |
|
| 126 |
## Evaluation Results
|
| 127 |
-
- Metric: Reference-less & reference-base metrics proposed in [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf).
|
| 128 |
-
- Models:
|
| 129 |
- Dynamic 10-shot models: For each test example, the most similar 10 examples were selected from the training set using BM25.
|
| 130 |
- Translate-test models: [google/gemma-7b-aps-it](https://huggingface.co/google/gemma-7b-aps-it) model + EN->KO, KO->EN translation using GPT-4o or GPT-4o-mini.
|
| 131 |
- Translate-train models: LoRA fine-tuned sLLMs using the Korean RoSE dataset.
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Overview
|
| 16 |
+
This model is designed for the **abstractive proposition segmentation task** in **Korean**, as described in the paper [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf). The model segments text into atomic and self-contained units (atomic facts).
|
| 17 |
|
| 18 |
# Training Details
|
| 19 |
+
- **Base Model**: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
|
| 20 |
+
- **Fine-tuning Method**: LoRA
|
| 21 |
+
- **Dataset**: [RoSE](https://huggingface.co/datasets/Salesforce/rose)
|
| 22 |
+
- **Translation**: The dataset was translated into Korean using GPT-4o.
|
| 23 |
+
- GPT-4o was prompted to translate propositions using the vocabulary in the text.
|
| 24 |
+
- **Data Split**: The dataset was randomly split into training, validation, and test sets (1900:100:500) for fine-tuning.
|
| 25 |
|
| 26 |
# Usage
|
| 27 |
## Data Preprocessing
|
|
|
|
| 121 |
</details>
|
| 122 |
|
| 123 |
## Inputs and Outputs
|
| 124 |
+
- **Input**: Text.
|
| 125 |
+
- **Output**: List of propositions for all the sentences in the text passage. The propositions for each sentence are grouped separately.
|
| 126 |
|
| 127 |
## Evaluation Results
|
| 128 |
+
- **Metric**: Reference-less & reference-base metrics proposed in [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf).
|
| 129 |
+
- **Models**:
|
| 130 |
- Dynamic 10-shot models: For each test example, the most similar 10 examples were selected from the training set using BM25.
|
| 131 |
- Translate-test models: [google/gemma-7b-aps-it](https://huggingface.co/google/gemma-7b-aps-it) model + EN->KO, KO->EN translation using GPT-4o or GPT-4o-mini.
|
| 132 |
- Translate-train models: LoRA fine-tuned sLLMs using the Korean RoSE dataset.
|