seonjeongh commited on
Commit
0a72c00
·
verified ·
1 Parent(s): 4df6a4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -13,14 +13,15 @@ tags:
13
  ---
14
 
15
  # Overview
16
- This model is designed for the **abstractive proposition segmentation task** in Korean, as described in the paper [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf). The model segments text into atomic and self-contained units (atomic facts).
17
 
18
  # Training Details
19
- - Base Model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
20
- - Peft: LoRA
21
- - Dataset: [RoSE](https://huggingface.co/datasets/Salesforce/rose)
22
- - The dataset was randomly split into training, validation, and test sets for fine-tuning.
23
- - The dataset was translated into Korean using GPT-4o.
 
24
 
25
  # Usage
26
  ## Data Preprocessing
@@ -120,12 +121,12 @@ print(results)
120
  </details>
121
 
122
  ## Inputs and Outputs
123
- - Input: Text.
124
- - Output: List of propositions for all the sentences in the text passage. The propositions for each sentence are grouped separately.
125
 
126
  ## Evaluation Results
127
- - Metric: Reference-less & reference-base metrics proposed in [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf).
128
- - Models:
129
  - Dynamic 10-shot models: For each test example, the most similar 10 examples were selected from the training set using BM25.
130
  - Translate-test models: [google/gemma-7b-aps-it](https://huggingface.co/google/gemma-7b-aps-it) model + EN->KO, KO->EN translation using GPT-4o or GPT-4o-mini.
131
  - Translate-train models: LoRA fine-tuned sLLMs using the Korean RoSE dataset.
 
13
  ---
14
 
15
  # Overview
16
+ This model is designed for the **abstractive proposition segmentation task** in **Korean**, as described in the paper [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf). The model segments text into atomic and self-contained units (atomic facts).
17
 
18
  # Training Details
19
+ - **Base Model**: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
20
+ - **Fine-tuning Method**: LoRA
21
+ - **Dataset**: [RoSE](https://huggingface.co/datasets/Salesforce/rose)
22
+ - **Translation**: The dataset was translated into Korean using GPT-4o.
23
+ - GPT-4o was prompted to translate propositions using the vocabulary in the text.
24
+ - **Data Split**: The dataset was randomly split into training, validation, and test sets (1900:100:500) for fine-tuning.
25
 
26
  # Usage
27
  ## Data Preprocessing
 
121
  </details>
122
 
123
  ## Inputs and Outputs
124
+ - **Input**: Text.
125
+ - **Output**: List of propositions for all the sentences in the text passage. The propositions for each sentence are grouped separately.
126
 
127
  ## Evaluation Results
128
+ - **Metric**: Reference-less & reference-base metrics proposed in [Scalable and Domain-General Abstractive Proposition Segmentation](https://aclanthology.org/2024.findings-emnlp.517.pdf).
129
+ - **Models**:
130
  - Dynamic 10-shot models: For each test example, the most similar 10 examples were selected from the training set using BM25.
131
  - Translate-test models: [google/gemma-7b-aps-it](https://huggingface.co/google/gemma-7b-aps-it) model + EN->KO, KO->EN translation using GPT-4o or GPT-4o-mini.
132
  - Translate-train models: LoRA fine-tuned sLLMs using the Korean RoSE dataset.