| | --- |
| | language: ko |
| | tags: |
| | - bart |
| | datasets: |
| | - korquad |
| | license: mit |
| | --- |
| | |
| | # Korean Question Generation Model |
| |
|
| | ## Github |
| |
|
| | https://github.com/Seoneun/KoBART-Question-Generation |
| |
|
| | ## Fine-tuning Dataset |
| |
|
| | KorQuAD 1.0 |
| |
|
| | ## Demo |
| |
|
| | https://huggingface.co/Sehong/kobart-QuestionGeneration |
| |
|
| | ## How to use |
| |
|
| | ```python |
| | import torch |
| | from transformers import PreTrainedTokenizerFast |
| | from transformers import BartForConditionalGeneration |
| | |
| | tokenizer = PreTrainedTokenizerFast.from_pretrained('Sehong/kobart-QuestionGeneration') |
| | model = BartForConditionalGeneration.from_pretrained('Sehong/kobart-QuestionGeneration') |
| | |
| | text = "1989λ
2μ 15μΌ μ¬μλ λλ―Ό νλ ₯ μμλ₯Ό μ£Όλν νμ(νλ ₯νμλ±μ²λ²μκ΄νλ²λ₯ μλ°)μΌλ‘ μ§λͺ
μλ°°λμλ€. 1989λ
3μ 12μΌ μμΈμ§λ°©κ²μ°°μ² 곡μλΆλ μμ’
μμ μ¬μ ꡬμμμ₯μ λ°λΆλ°μλ€. κ°μ ν΄ 6μ 30μΌ νμμΆμ μ μμκ²½μ λνλ‘ ν견νμ¬ κ΅κ°λ³΄μλ²μλ° νμκ° μΆκ°λμλ€. κ²½μ°°μ 12μ 18μΌ~20μΌ μ¬μ΄ μμΈ κ²½ν¬λνκ΅μμ μμ’
μμ΄ μ±λͺ
λ°νλ₯Ό μΆμ§νκ³ μλ€λ 첩보λ₯Ό μ
μνκ³ , 12μ 18μΌ μ€μ 7μ 40λΆ κ²½ κ°μ€μ΄κ³Ό μ μλ΄μΌλ‘ 무μ₯ν νΉκ³΅μ‘° λ° λ곡과 μ§μ 12λͺ
λ± 22λͺ
μ μ¬λ³΅ κ²½μ°°μ μΉμ©μ°¨ 8λμ λλμ΄ κ²½ν¬λνκ΅μ ν¬μ
νλ€. 1989λ
12μ 18μΌ μ€μ 8μ 15λΆ κ²½ μμΈμ²λ리경찰μλ νΈμ νμ 5λͺ
κ³Ό ν¨κ» κ²½ν¬λνκ΅ νμνκ΄ κ±΄λ¬Ό κ³λ¨μ λ΄λ €μ€λ μμ’
μμ λ°κ²¬, κ²κ±°ν΄ ꡬμμ μ§ννλ€. μμ’
μμ μ²λ리경찰μμμ μ½ 1μκ° λμ μ‘°μ¬λ₯Ό λ°μ λ€ μ€μ 9μ 50λΆ κ²½ μμΈ μ₯μλμ μμΈμ§λ°©κ²½μ°°μ² 곡μλΆμ€λ‘ μΈκ³λμλ€. <unused0> 1989λ
2μ 15μΌ" |
| | |
| | raw_input_ids = tokenizer.encode(text) |
| | input_ids = [tokenizer.bos_token_id] + raw_input_ids + [tokenizer.eos_token_id] |
| | |
| | summary_ids = model.generate(torch.tensor([input_ids])) |
| | print(tokenizer.decode(summary_ids.squeeze().tolist(), skip_special_tokens=True)) |
| | |
| | # <unused0> is sep_token, sep_token seperate content and answer |
| | ``` |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|