| | --- |
| | datasets: |
| | - ai4anshu/sentence-compression |
| | language: |
| | - en |
| | metrics: |
| | - rouge |
| | library_name: transformers |
| | pipeline_tag: summarization |
| | --- |
| | ## Getting Started |
| |
|
| | ### Installation |
| |
|
| | 1. conda environment |
| | ``` |
| | conda env create --name NAME --file=environment.yaml |
| | ``` |
| |
|
| |
|
| | The Project is designed around several scripts that simulate a typical machine learning workflow. Starting with data preparation after preparing data, training model, evaluation and inference model. `google/t5-small` model was being trained on above dataset for `10` epochs. Later inference ran on evaluation data, performance metrics and evaluation results were stored inside `result` subdirectory of `project` directory. |
| |
|
| | I added Makefile which can be used to run python scripts separately using following bash commands. |
| |
|
| | ```bash |
| | make data |
| | make train |
| | make eval |
| | make inference |
| | ``` |
| |
|
| | `run` is a bash command which can aggregately run entire project. |
| |
|
| | ```bash |
| | make run |
| | ``` |
| |
|
| | `clean` is a bash command which can be used to clean the previous runs. |
| |
|
| | ```bash |
| | make clean |
| | ``` |
| |
|
| | Performance metrics stores into `performance.json` file inside `results` directory. |
| |
|
| | ```json |
| | { |
| | "rouge1": 0.79689240266461, |
| | "rouge2": 0.7606140631154827, |
| | "rougeL": 0.7733855633904199, |
| | "rougeLsum": 0.7734703253159519 |
| | } |
| | ``` |
| |
|
| | And also, `eval_results.csv` containing predictions of evaluation file. |
| |
|
| | | original | compressed | predictions | |
| | |-----------|------------|-------------| |
| | | sentence1 | compress1 | prediction1 | |
| | | sentence2 | compress2 | prediction2 | |
| | | : | : | : | |
| |
|
| | ### References: |
| | 1. https://github.com/google-research-datasets/sentence-compression |
| | 2. https://huggingface.co/docs/transformers/en/tasks/summarization |
| |
|
| | ### Note: |
| | Download trained checkpoint from given drive link [checkpoint](https://drive.google.com/drive/folders/1yrl0VtmM9BtT4aU2Z5vLs6doz35MMxvM?usp=drive_link) |