Spaces:
Running
title: Text Summarizer
emoji: 📝
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
Text Summarizer
A robust web application leveraging multiple NLP algorithms (SpaCy, NLTK, Gensim, Sumy) to summarize textual content and URL sources, featuring a comparative analysis interface for evaluating summarization quality.
Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments
🤝🏻 Special Acknowledgement
Special thanks to Mega Satish for her meaningful contributions, guidance, and support that helped shape this work.
Overview
This project implements a versatile Text Summarizer capable of condensing large bodies of text or web content into concise summaries. It serves as a comparative platform for various Extractive Summarization techniques, including frequency-based methods (SpaCy, NLTK) and graph-based algorithms (TextRank via Gensim, LexRank via Sumy).
Developed as a mini-project for the 8th Semester curriculum, this system addresses the need for efficient information retrieval by automating the abstraction of key insights from documents. It features a Flask-based web interface that allows users to input raw text or URLs and visualize the comparative performance of different NLP models.
Research Impact & Certification
This project was published as a research paper in the International Journal for Research in Applied Science and Engineering Technology (IJRASET) (Volume 10, Issue 1) and is also available as a preprint on viXra. The project received an official Publication Certificate for its research contribution to natural language processing.
Resources
| # | Resource | Description |
|---|---|---|
| 1 | Technical Report | Detailed project documentation |
| 2 | Project Presentation | Visual demonstration and slides |
| 3 | Technical Specification | Technical Architecture & Specification |
| 4 | Source Code | Complete source code and documentation |
| 5 | Research Article | IJRASET Published Paper |
| 6 | Scholarly Preprint | Formal research manuscript (viXra) |
| 7 | Project Demo | Real-time demonstration of features |
| 8 | NLP Laboratory | Academic repository for NLP |
Algorithm Selection for Optimal Results
For long-form documents, Gensim's TextRank provides superior coherence by leveraging graph-based sentence ranking. For shorter texts or news articles, SpaCy's frequency-based approach offers faster execution with comparable quality.
Features
| Feature | Description |
|---|---|
| Multi-Algorithm Support | Unified interface for SpaCy, NLTK, Gensim, and Sumy summarization engines. |
| Comparative Analysis | Side-by-side visualization of summaries with reading time reduction metrics. |
| Web Scraping | Integrated BeautifulSoup module to extract and process text directly from web links. |
| Material UI | Responsive frontend built with Materialize CSS for a clean, modern research aesthetic. |
| Performance Metrics | Real-time calculation of original vs. summarized reading times and execution speed. |
| Scholarly Codebase | Fully documented source code with strict academic formatting and inline citations. |
Tech Stack
- Backend: Python 3.x, Flask
- NLP Libraries: SpaCy, NLTK, Gensim, Sumy
- Frontend: HTML5, Materialize CSS, jQuery
- Utilities: BeautifulSoup4, lxml
Project Structure
TEXT-SUMMARIZER/
│
├── docs/ # Formal Documentation
│ └── SPECIFICATION.md # Technical Architecture & Specification
│
├── Mega/ # Archival Attribution Assets
│ ├── Filly.jpg # Project-related Content Asset
│ └── Mega.png # Author Profile Image (Mega Satish)
│
├── Mini-Project/ # Research & Academic Assets
│ ├── TEXT SUMMARIZER.pdf # Technical Project Report (PDF)
│ ├── TEXT SUMMARIZER.pptx # Project Presentation (PPTX)
│ └── Text Summarizer Using Julia/ # Related Research Materials
│
├── Source Code/ # Application Implementation
│ ├── static/ # Frontend Assets (CSS/JS)
│ ├── templates/ # HTML Jinja2 Templates
│ ├── app.py # Main Flask Application
│ ├── nltk_summarization.py # NLTK Logic Module
│ ├── spacy_summarization.py # SpaCy Logic Module
│ ├── spacy_summarizer.py # SpaCy Helper Module
│ ├── Procfile # Heroku Deployment Config
│ └── requirements.txt # Dependency Manifest
│
├── .gitattributes # Global Git LFS & Config
├── .gitignore # Asset Exclusion Manifest
├── CITATION.cff # Scholarly Citation Metadata
├── codemeta.json # Software Metadata Manifest
├── LICENSE # MIT License Terms
├── README.md # Comprehensive Archival Entrance
└── SECURITY.md # Vulnerability Exposure Policy
Results Gallery
Application Interface
The interface provides a clean, side-by-side comparison of summarization results along with reading time metrics.
Quick Start
1. Prerequisites
Ensure your environment meets the following requirements:
- Python: Version 3.6 or higher.
- Packages: Flask, SpaCy, NLTK, Gensim, Sumy.
- NLP Models:
en_core_web_sm(SpaCy),stopwords/punkt(NLTK).
Technical Dependencies & Environment
This system requires Python 3.6+ and multiple NLP libraries (SpaCy, NLTK, Gensim, Sumy). For stable execution, it is recommended to run this in an isolated virtual environment and ensure all SpaCy language models are downloaded prior to execution.
2. Setup & Installation
- Clone the Repository:
git clone https://github.com/Amey-Thakur/TEXT-SUMMARIZER.git cd TEXT-SUMMARIZER/Source\ Code - Install Dependencies:
pip install -r requirements.txt python -m spacy download en_core_web_sm
3. Launch Application
- Run the Flask Server:
python app.py - Access the Interface:
- Open your browser and navigate to
http://127.0.0.1:5000/.
- Open your browser and navigate to
Usage Guidelines
This repository is openly shared to support learning and knowledge exchange across the academic community.
For Students
Use this project as a reference for implementing NLP pipelines, understanding Flask web architecture, and integrating multiple machine learning libraries into a single application.
For Educators
This project may serve as a practical example or supplementary teaching resource for Natural Language Processing (DLO8012) and Computational Lab II (CSL804) as part of the 8th Semester Computer Engineering curriculum. Attribution is appreciated when utilizing content.
For Researchers
The comparative framework allows for the evaluation of different extractive summarization algorithms on custom datasets, providing a baseline for further research into abstractive methods.
License
This repository and all linked academic content are made available under the MIT License. See the LICENSE file for complete terms.
Summary: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original author.
Copyright © 2022 Amey Thakur, Mega Satish
About This Repository
Created & Maintained by: Amey Thakur & Mega Satish
Academic Journey: Bachelor of Engineering in Computer Engineering (2018-2022)
Institution: Terna Engineering College, Navi Mumbai
University: University of Mumbai
This project features the Text Summarizer, a utility developed as an 8th Semester Mini-Project. It represents a culmination of studies in computational linguistics and software engineering, delivering a functional tool for automated text analysis.
Connect: GitHub · LinkedIn · ORCID
Acknowledgments
Grateful acknowledgment to Mega Satish for her exceptional collaboration and scholarly partnership during the development of this project. Her intellectual contributions, technical insights, and dedicated commitment to software quality were fundamental in achieving the system's analytical and functional objectives. Learning alongside her was a transformative experience; her thoughtful approach to problem-solving and encouragement turned challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Mega, for everything you shared and taught along the way.
Grateful acknowledgment to the faculty members of the Department of Computer Engineering at Terna Engineering College for their guidance and instruction in Natural Language Processing. Their expertise in computational linguistics and algorithmic design helped shape the technical foundation of this project.
Special thanks to the mentors and peers whose encouragement, discussions, and support contributed meaningfully to this learning experience.
Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments
🔬 Natural Language Processing Laboratory · 📝 Text Summarizer
Presented as part of the 8th Semester Mini-Project @ Terna Engineering College
🎓 Computer Engineering Repository
Computer Engineering (B.E.) - University of Mumbai
Semester-wise curriculum, laboratories, projects, and academic notes.



