Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Text Summarization
|
| 2 |
This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
|
| 3 |
### Task Description:Chinese News Summarization (Title Generation)
|
|
@@ -20,19 +25,45 @@ output(news title):
|
|
| 20 |
After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
|
| 21 |
However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
|
| 22 |
```
|
| 23 |
-
Greedy Result(f1-score):rouge-1:
|
| 24 |
```
|
| 25 |
- Beam Search
|
| 26 |
Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
|
| 27 |
Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
|
| 28 |
-
However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
|
| 29 |
-
For example
|
| 30 |
```
|
|
|
|
| 31 |
I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
```
|
| 33 |
- Top k Sampling
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- zh
|
| 5 |
+
---
|
| 6 |
# Text Summarization
|
| 7 |
This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
|
| 8 |
### Task Description:Chinese News Summarization (Title Generation)
|
|
|
|
| 25 |
After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
|
| 26 |
However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
|
| 27 |
```
|
| 28 |
+
Greedy Result(f1-score):rouge-1: 1.5, rouge-2: 0.9, rouge-L: 1.4
|
| 29 |
```
|
| 30 |
- Beam Search
|
| 31 |
Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
|
| 32 |
Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
|
| 33 |
+
However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
|
| 34 |
+
For example
|
| 35 |
```
|
| 36 |
+
input:
|
| 37 |
I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
|
| 38 |
+
output:
|
| 39 |
+
What do you like to listen?
|
| 40 |
+
```
|
| 41 |
+
```
|
| 42 |
+
beam size = 5
|
| 43 |
+
Beam Search Result(f1-score):rouge-1: 7.4, rouge-2: 1.9, rouge-L: 6.9
|
| 44 |
```
|
| 45 |
- Top k Sampling
|
| 46 |
+
Sampling is a strategy to randomly choose the next word via the probability distribution instead of argmax.
|
| 47 |
+
Therefore, Top k Sampling samples the word via distribution but restricted to top-k probable words.
|
| 48 |
+
However, there is a problem when sampling the rarely used word, the sentence will not fluent.
|
| 49 |
+
```
|
| 50 |
+
k = 5
|
| 51 |
+
Top k Result(f1-score):rouge-1: 4.0, rouge-2: 0.5, rouge-L: 3.7
|
| 52 |
+
```
|
| 53 |
+
- Nucleus(Top p) Sampling
|
| 54 |
+
Nucleus Sampling is sampling from a subset of vocabulary with the most probability mass.
|
| 55 |
+
It can dynamically shrink and expand top-k.
|
| 56 |
+
```
|
| 57 |
+
p = 5
|
| 58 |
+
Top p Result(f1-score):rouge-1: 3.0, rouge-2: 0.2, rouge-L: 2.9
|
| 59 |
+
```
|
| 60 |
+
- Temperature
|
| 61 |
+
softmax temperature is applying a temperature hyperparameter to the softmax.
|
| 62 |
+
with high temperature: become more uniform, more diversity
|
| 63 |
+
with low temperature:become more spiky, less diversity
|
| 64 |
+
```
|
| 65 |
+
temperature = 5
|
| 66 |
+
Temperature Result(f1-score):rouge-1: 2.1, rouge-2: 0.04, rouge-L: 1.9
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
As the result, we can figure out that in this task, beam search outperforms other strategies.
|