Update README.md
Browse files
README.md
CHANGED
|
@@ -92,7 +92,7 @@ The model was pre-trained continuously on a single A10G GPU in an AWS instance f
|
|
| 92 |
#### Possible Future Directions:
|
| 93 |
|
| 94 |
1. Use a decoder-only model for pre-training and summarization.
|
| 95 |
-
<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder to decoder generation.
|
| 96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
| 97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
| 98 |
|
|
|
|
| 92 |
#### Possible Future Directions:
|
| 93 |
|
| 94 |
1. Use a decoder-only model for pre-training and summarization.
|
| 95 |
+
<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation.
|
| 96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
| 97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
| 98 |
|