Spaces:
Running
Running
File size: 14,282 Bytes
7107674 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
---
title: Text Summarizer
emoji: 📝
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
---
<div align="center">
<a name="readme-top"></a>
# Text Summarizer
[](LICENSE)

[](https://github.com/Amey-Thakur/TEXT-SUMMARIZER)
[](https://github.com/Amey-Thakur/TEXT-SUMMARIZER)
A robust web application leveraging multiple NLP algorithms (SpaCy, NLTK, Gensim, Sumy) to summarize textual content and URL sources, featuring a comparative analysis interface for evaluating summarization quality.
**[Source Code](Source%20Code/)** · **[Technical Specification](docs/SPECIFICATION.md)** · **[Video Demo](https://youtu.be/2drrqsSB1Bc)**
[](https://youtu.be/2drrqsSB1Bc)
</div>
---
<div align="center">
[Authors](#authors) · [Overview](#overview) · [Features](#features) · [Structure](#project-structure) · [Results](#results-gallery) · [Quick Start](#quick-start) · [Usage Guidelines](#usage-guidelines) · [License](#license) · [About](#about-this-repository) · [Acknowledgments](#acknowledgments)
</div>
---
<!-- AUTHORS -->
<div align="center">
## Authors
**Terna Engineering College | Computer Engineering | Batch of 2022**
| <a href="https://github.com/Amey-Thakur"><img src="https://github.com/Amey-Thakur.png" width="150" height="150" alt="Amey Thakur"></a><br>[**Amey Thakur**](https://github.com/Amey-Thakur)<br><br>[](https://orcid.org/0000-0001-5644-1575) | <a href="https://github.com/msatmod"><img src="https://raw.githubusercontent.com/Amey-Thakur/TEXT-SUMMARIZER/main/Mega/Mega.png" width="150" height="150" alt="Mega Satish"></a><br>[**Mega Satish**](https://github.com/msatmod)<br><br>[](https://orcid.org/0000-0002-1844-9557) |
| :---: | :---: |
</div>
> [!IMPORTANT]
> ### 🤝🏻 Special Acknowledgement
> *Special thanks to **[Mega Satish](https://github.com/msatmod)** for her meaningful contributions, guidance, and support that helped shape this work.*
---
<!-- OVERVIEW -->
## Overview
This project implements a versatile **Text Summarizer** capable of condensing large bodies of text or web content into concise summaries. It serves as a comparative platform for various Extractive Summarization techniques, including frequency-based methods (SpaCy, NLTK) and graph-based algorithms (TextRank via Gensim, LexRank via Sumy).
Developed as a mini-project for the **8th Semester** curriculum, this system addresses the need for efficient information retrieval by automating the abstraction of key insights from documents. It features a Flask-based web interface that allows users to input raw text or URLs and visualize the comparative performance of different NLP models.
> [!NOTE]
> **Research Impact & Certification**
>
> This project was published as a research paper in the **International Journal for Research in Applied Science and Engineering Technology (IJRASET)** (Volume 10, Issue 1) and is also available as a preprint on **viXra**. The project received an official **Publication Certificate** for its research contribution to natural language processing.
>
> - [Preprint @viXra](https://vixra.org/abs/2202.0017)
> - [Published Paper @IJRASET](https://doi.org/10.22214/ijraset.2022.40066)
> - [Publication Certificate](https://github.com/Amey-Thakur/ACHIEVEMENTS/blob/main/Research%20Papers/Text%20Summarizer%20Using%20Julia/IJRASET40066%20-%20Text%20Summarizer%20Using%20Julia.pdf)
### Resources
| # | Resource | Description |
|---|---|---|
| 1 | [**Technical Report**](Mini-Project/TEXT%20SUMMARIZER.pdf) | Detailed project documentation |
| 2 | [**Project Presentation**](Mini-Project/TEXT%20SUMMARIZER.pptx) | Visual demonstration and slides |
| 3 | [**Technical Specification**](docs/SPECIFICATION.md) | Technical Architecture & Specification |
| 4 | [**Source Code**](Source%20Code/) | Complete source code and documentation |
| 5 | [**Research Article**](https://doi.org/10.22214/ijraset.2022.40066) | IJRASET Published Paper |
| 6 | [**Scholarly Preprint**](https://vixra.org/abs/2202.0017) | Formal research manuscript (viXra) |
| 7 | [**Project Demo**](https://youtu.be/2drrqsSB1Bc) | Real-time demonstration of features |
| 8 | [**NLP Laboratory**](https://github.com/Amey-Thakur/NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II) | Academic repository for NLP |
> [!TIP]
> **Algorithm Selection for Optimal Results**
>
> For long-form documents, **Gensim's TextRank** provides superior coherence by leveraging graph-based sentence ranking. For shorter texts or news articles, **SpaCy's frequency-based** approach offers faster execution with comparable quality.
---
<!-- FEATURES -->
## Features
| Feature | Description |
|---------|-------------|
| **Multi-Algorithm Support** | Unified interface for SpaCy, NLTK, Gensim, and Sumy summarization engines. |
| **Comparative Analysis** | Side-by-side visualization of summaries with reading time reduction metrics. |
| **Web Scraping** | Integrated BeautifulSoup module to extract and process text directly from web links. |
| **Material UI** | Responsive frontend built with Materialize CSS for a clean, modern research aesthetic. |
| **Performance Metrics** | Real-time calculation of original vs. summarized reading times and execution speed. |
| **Scholarly Codebase** | Fully documented source code with strict academic formatting and inline citations. |
### Tech Stack
- **Backend**: Python 3.x, Flask
- **NLP Libraries**: SpaCy, NLTK, Gensim, Sumy
- **Frontend**: HTML5, Materialize CSS, jQuery
- **Utilities**: BeautifulSoup4, lxml
---
<!-- PROJECT STRUCTURE -->
## Project Structure
```python
TEXT-SUMMARIZER/
│
├── docs/ # Formal Documentation
│ └── SPECIFICATION.md # Technical Architecture & Specification
│
├── Mega/ # Archival Attribution Assets
│ ├── Filly.jpg # Project-related Content Asset
│ └── Mega.png # Author Profile Image (Mega Satish)
│
├── Mini-Project/ # Research & Academic Assets
│ ├── TEXT SUMMARIZER.pdf # Technical Project Report (PDF)
│ ├── TEXT SUMMARIZER.pptx # Project Presentation (PPTX)
│ └── Text Summarizer Using Julia/ # Related Research Materials
│
├── Source Code/ # Application Implementation
│ ├── static/ # Frontend Assets (CSS/JS)
│ ├── templates/ # HTML Jinja2 Templates
│ ├── app.py # Main Flask Application
│ ├── nltk_summarization.py # NLTK Logic Module
│ ├── spacy_summarization.py # SpaCy Logic Module
│ ├── spacy_summarizer.py # SpaCy Helper Module
│ ├── Procfile # Heroku Deployment Config
│ └── requirements.txt # Dependency Manifest
│
├── .gitattributes # Global Git LFS & Config
├── .gitignore # Asset Exclusion Manifest
├── CITATION.cff # Scholarly Citation Metadata
├── codemeta.json # Software Metadata Manifest
├── LICENSE # MIT License Terms
├── README.md # Comprehensive Archival Entrance
└── SECURITY.md # Vulnerability Exposure Policy
```
---
<!-- RESULTS GALLERY -->
## Results Gallery
### Application Interface
The interface provides a clean, side-by-side comparison of summarization results along with reading time metrics.
<div align="center">

</div>
---
<!-- QUICK START -->
## Quick Start
### 1. Prerequisites
Ensure your environment meets the following requirements:
- **Python**: Version **3.6** or higher.
- **Packages**: Flask, SpaCy, NLTK, Gensim, Sumy.
- **NLP Models**: `en_core_web_sm` (SpaCy), `stopwords/punkt` (NLTK).
> [!WARNING]
> **Technical Dependencies & Environment**
>
> This system requires **Python 3.6+** and multiple NLP libraries (SpaCy, NLTK, Gensim, Sumy). For stable execution, it is recommended to run this in an isolated virtual environment and ensure all SpaCy language models are downloaded prior to execution.
### 2. Setup & Installation
1. **Clone the Repository**:
```bash
git clone https://github.com/Amey-Thakur/TEXT-SUMMARIZER.git
cd TEXT-SUMMARIZER/Source\ Code
```
2. **Install Dependencies**:
```bash
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```
### 3. Launch Application
1. **Run the Flask Server**:
```bash
python app.py
```
2. **Access the Interface**:
- Open your browser and navigate to `http://127.0.0.1:5000/`.
---
<!-- USAGE GUIDELINES -->
## Usage Guidelines
This repository is openly shared to support learning and knowledge exchange across the academic community.
**For Students**
Use this project as a reference for implementing NLP pipelines, understanding Flask web architecture, and integrating multiple machine learning libraries into a single application.
**For Educators**
This project may serve as a practical example or supplementary teaching resource for **Natural Language Processing (`DLO8012`)** and **Computational Lab II (`CSL804`)** as part of the **8th Semester Computer Engineering** curriculum. Attribution is appreciated when utilizing content.
**For Researchers**
The comparative framework allows for the evaluation of different extractive summarization algorithms on custom datasets, providing a baseline for further research into abstractive methods.
---
<!-- LICENSE -->
## License
This repository and all linked academic content are made available under the **MIT License**. See the [LICENSE](LICENSE) file for complete terms.
> [!NOTE]
> **Summary**: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original author.
Copyright © 2022 Amey Thakur, Mega Satish
---
<!-- ABOUT -->
## About This Repository
**Created & Maintained by**: [Amey Thakur](https://github.com/Amey-Thakur) & [Mega Satish](https://github.com/msatmod)
**Academic Journey**: Bachelor of Engineering in Computer Engineering (2018-2022)
**Institution**: [Terna Engineering College](https://ternaengg.ac.in/), Navi Mumbai
**University**: [University of Mumbai](https://mu.ac.in/)
This project features the **Text Summarizer**, a utility developed as an **8th Semester Mini-Project**. It represents a culmination of studies in computational linguistics and software engineering, delivering a functional tool for automated text analysis.
**Connect**: [GitHub](https://github.com/Amey-Thakur) · [LinkedIn](https://www.linkedin.com/in/amey-thakur) · [ORCID](https://orcid.org/0000-0001-5644-1575)
### Acknowledgments
Grateful acknowledgment to [**Mega Satish**](https://github.com/msatmod) for her exceptional collaboration and scholarly partnership during the development of this project. Her intellectual contributions, technical insights, and dedicated commitment to software quality were fundamental in achieving the system's analytical and functional objectives. Learning alongside her was a transformative experience; her thoughtful approach to problem-solving and encouragement turned challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Mega, for everything you shared and taught along the way.
Grateful acknowledgment to the faculty members of the **Department of Computer Engineering** at Terna Engineering College for their guidance and instruction in Natural Language Processing. Their expertise in computational linguistics and algorithmic design helped shape the technical foundation of this project.
Special thanks to the mentors and peers whose encouragement, discussions, and support contributed meaningfully to this learning experience.
---
<div align="center">
[↑ Back to Top](#readme-top)
[Authors](#authors) · [Overview](#overview) · [Features](#features) · [Structure](#project-structure) · [Results](#results-gallery) · [Quick Start](#quick-start) · [Usage Guidelines](#usage-guidelines) · [License](#license) · [About](#about-this-repository) · [Acknowledgments](#acknowledgments)
<br>
🔬 **[Natural Language Processing Laboratory](https://github.com/Amey-Thakur/NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II)** · 📝 **[Text Summarizer](https://github.com/Amey-Thakur/TEXT-SUMMARIZER)**
---
#### Presented as part of the 8th Semester Mini-Project @ Terna Engineering College
---
### 🎓 [Computer Engineering Repository](https://github.com/Amey-Thakur/COMPUTER-ENGINEERING)
**Computer Engineering (B.E.) - University of Mumbai**
*Semester-wise curriculum, laboratories, projects, and academic notes.*
</div>
|