Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,101 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- gpu
|
| 7 |
+
---
|
| 8 |
+
# Text Summarization Model with Seq2Seq and LSTM
|
| 9 |
+
|
| 10 |
+
This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.
|
| 11 |
+
|
| 12 |
+
## Model Architecture
|
| 13 |
+
|
| 14 |
+
### Encoder
|
| 15 |
+
|
| 16 |
+
- **Input Layer:** Takes input sequences of length `max_len_article`.
|
| 17 |
+
- **Embedding Layer:** Converts input sequences into dense vectors of size 100.
|
| 18 |
+
- **Bidirectional LSTM Layer:** Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
|
| 19 |
+
- **State Concatenation:** Combines forward and backward hidden and cell states to form the final encoder states.
|
| 20 |
+
|
| 21 |
+
### Decoder
|
| 22 |
+
|
| 23 |
+
- **Input Layer:** Takes target sequences of variable length.
|
| 24 |
+
- **Embedding Layer:** Converts target sequences into dense vectors of size 100.
|
| 25 |
+
- **LSTM Layer:** Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
|
| 26 |
+
- **Dense Layer:** Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.
|
| 27 |
+
|
| 28 |
+
### Model Summary
|
| 29 |
+
|
| 30 |
+
| Layer (type) | Output Shape | Param # | Connected to |
|
| 31 |
+
|-----------------------|---------------------|-------------|-----------------------------|
|
| 32 |
+
| input_1 (InputLayer) | [(None, 800)] | 0 | - |
|
| 33 |
+
| embedding (Embedding) | (None, 800, 100) | 47,619,900 | input_1[0][0] |
|
| 34 |
+
| bidirectional | [(None, 200), | 160,800 | embedding[0][0] |
|
| 35 |
+
| (Bidirectional) | (None, 100), | | |
|
| 36 |
+
| | (None, 100), | | |
|
| 37 |
+
| | (None, 100), | | |
|
| 38 |
+
| | (None, 100)] | | |
|
| 39 |
+
| input_2 (InputLayer) | [(None, None)] | 0 | - |
|
| 40 |
+
| embedding_1 | (None, None, 100) | 15,515,800 | input_2[0][0] |
|
| 41 |
+
| (Embedding) | | | |
|
| 42 |
+
| concatenate | (None, 200) | 0 | bidirectional[0][1] |
|
| 43 |
+
| (Concatenate) | | | bidirectional[0][3] |
|
| 44 |
+
| concatenate_1 | (None, 200) | 0 | bidirectional[0][2] |
|
| 45 |
+
| (Concatenate) | | | bidirectional[0][4] |
|
| 46 |
+
| lstm | [(None, None, 200), | 240,800 | embedding_1[0][0] |
|
| 47 |
+
| (LSTM) | (None, 200), | | concatenate[0][0] |
|
| 48 |
+
| | (None, 200)] | | concatenate_1[0][0] |
|
| 49 |
+
| dense (Dense) | (None, None, 155158)| 31,186,758 | lstm[0][0] |
|
| 50 |
+
| | | | |
|
| 51 |
+
|
| 52 |
+
Total params: 94,724,060
|
| 53 |
+
|
| 54 |
+
Trainable params: 94,724,058
|
| 55 |
+
|
| 56 |
+
Non-trainable params: 0
|
| 57 |
+
|
| 58 |
+
## Training
|
| 59 |
+
|
| 60 |
+
The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:
|
| 61 |
+
|
| 62 |
+
- **Optimizer:** Adam
|
| 63 |
+
- **Loss Function:** Categorical Crossentropy
|
| 64 |
+
- **Metrics:** Accuracy
|
| 65 |
+
|
| 66 |
+
### Training Loss and Validation Loss
|
| 67 |
+
|
| 68 |
+
| Epoch | Training Loss | Validation Loss | Time per Epoch (s) |
|
| 69 |
+
|-------|---------------|-----------------|--------------------|
|
| 70 |
+
| 1 | 3.9044 | 0.4543 | 3087 |
|
| 71 |
+
| 2 | 0.3429 | 0.0976 | 3091 |
|
| 72 |
+
| 3 | 0.1054 | 0.0427 | 3096 |
|
| 73 |
+
| 4 | 0.0490 | 0.0231 | 3099 |
|
| 74 |
+
| 5 | 0.0203 | 0.0148 | 3098 |
|
| 75 |
+
|
| 76 |
+
### Test Loss
|
| 77 |
+
|
| 78 |
+
| Test Loss |
|
| 79 |
+
|----------------------|
|
| 80 |
+
| 0.014802712015807629 |
|
| 81 |
+
|
| 82 |
+
## Usage -- I will update this soon
|
| 83 |
+
|
| 84 |
+
To use this model, you can load it using the Hugging Face Transformers library:
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
from transformers import TFAutoModel
|
| 88 |
+
|
| 89 |
+
model = TFAutoModel.from_pretrained('your-model-name')
|
| 90 |
+
|
| 91 |
+
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
|
| 92 |
+
|
| 93 |
+
tokenizer = AutoTokenizer.from_pretrained('your-model-name')
|
| 94 |
+
model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')
|
| 95 |
+
|
| 96 |
+
article = "Your input text here."
|
| 97 |
+
inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
|
| 98 |
+
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
|
| 99 |
+
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
|
| 100 |
+
|
| 101 |
+
print(summary)
|