Spaces:

ameythakur
/

text-summarizer

Running

App Files Files Community

text-summarizer / README.md

ameythakur

Text Summarizer

7107674 8 days ago

preview code

raw

history blame contribute delete

14.3 kB

metadata

title: Text Summarizer
emoji: 📝
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit

Text Summarizer

A robust web application leveraging multiple NLP algorithms (SpaCy, NLTK, Gensim, Sumy) to summarize textual content and URL sources, featuring a comparative analysis interface for evaluating summarization quality.

Source Code · Technical Specification · Video Demo

Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments

Authors

Terna Engineering College | Computer Engineering | Batch of 2022

Amey Thakur	Mega Satish

🤝🏻 Special Acknowledgement

Special thanks to Mega Satish for her meaningful contributions, guidance, and support that helped shape this work.

Overview

This project implements a versatile Text Summarizer capable of condensing large bodies of text or web content into concise summaries. It serves as a comparative platform for various Extractive Summarization techniques, including frequency-based methods (SpaCy, NLTK) and graph-based algorithms (TextRank via Gensim, LexRank via Sumy).

Developed as a mini-project for the 8th Semester curriculum, this system addresses the need for efficient information retrieval by automating the abstraction of key insights from documents. It features a Flask-based web interface that allows users to input raw text or URLs and visualize the comparative performance of different NLP models.

Research Impact & Certification

This project was published as a research paper in the International Journal for Research in Applied Science and Engineering Technology (IJRASET) (Volume 10, Issue 1) and is also available as a preprint on viXra. The project received an official Publication Certificate for its research contribution to natural language processing.

Preprint @viXra

Published Paper @IJRASET

Publication Certificate

Resources

#	Resource	Description
1	Technical Report	Detailed project documentation
2	Project Presentation	Visual demonstration and slides
3	Technical Specification	Technical Architecture & Specification
4	Source Code	Complete source code and documentation
5	Research Article	IJRASET Published Paper
6	Scholarly Preprint	Formal research manuscript (viXra)
7	Project Demo	Real-time demonstration of features
8	NLP Laboratory	Academic repository for NLP

Algorithm Selection for Optimal Results

For long-form documents, Gensim's TextRank provides superior coherence by leveraging graph-based sentence ranking. For shorter texts or news articles, SpaCy's frequency-based approach offers faster execution with comparable quality.

Features

Feature	Description
Multi-Algorithm Support	Unified interface for SpaCy, NLTK, Gensim, and Sumy summarization engines.
Comparative Analysis	Side-by-side visualization of summaries with reading time reduction metrics.
Web Scraping	Integrated BeautifulSoup module to extract and process text directly from web links.
Material UI	Responsive frontend built with Materialize CSS for a clean, modern research aesthetic.
Performance Metrics	Real-time calculation of original vs. summarized reading times and execution speed.
Scholarly Codebase	Fully documented source code with strict academic formatting and inline citations.

Tech Stack

Backend: Python 3.x, Flask
NLP Libraries: SpaCy, NLTK, Gensim, Sumy
Frontend: HTML5, Materialize CSS, jQuery
Utilities: BeautifulSoup4, lxml

Project Structure

TEXT-SUMMARIZER/
│
├── docs/                                          # Formal Documentation
│   └── SPECIFICATION.md                           # Technical Architecture & Specification
│
├── Mega/                                          # Archival Attribution Assets
│   ├── Filly.jpg                                  # Project-related Content Asset
│   └── Mega.png                                   # Author Profile Image (Mega Satish)
│
├── Mini-Project/                                  # Research & Academic Assets
│   ├── TEXT SUMMARIZER.pdf                        # Technical Project Report (PDF)
│   ├── TEXT SUMMARIZER.pptx                       # Project Presentation (PPTX)
│   └── Text Summarizer Using Julia/               # Related Research Materials
│
├── Source Code/                                   # Application Implementation
│   ├── static/                                    # Frontend Assets (CSS/JS)
│   ├── templates/                                 # HTML Jinja2 Templates
│   ├── app.py                                     # Main Flask Application
│   ├── nltk_summarization.py                      # NLTK Logic Module
│   ├── spacy_summarization.py                     # SpaCy Logic Module
│   ├── spacy_summarizer.py                        # SpaCy Helper Module
│   ├── Procfile                                   # Heroku Deployment Config
│   └── requirements.txt                           # Dependency Manifest
│
├── .gitattributes                                 # Global Git LFS & Config
├── .gitignore                                     # Asset Exclusion Manifest
├── CITATION.cff                                   # Scholarly Citation Metadata
├── codemeta.json                                  # Software Metadata Manifest
├── LICENSE                                        # MIT License Terms
├── README.md                                      # Comprehensive Archival Entrance
└── SECURITY.md                                    # Vulnerability Exposure Policy

Results Gallery

Application Interface

The interface provides a clean, side-by-side comparison of summarization results along with reading time metrics.

Quick Start

1. Prerequisites

Ensure your environment meets the following requirements:

Python: Version 3.6 or higher.
Packages: Flask, SpaCy, NLTK, Gensim, Sumy.
NLP Models: en_core_web_sm (SpaCy), stopwords/punkt (NLTK).

Technical Dependencies & Environment

This system requires Python 3.6+ and multiple NLP libraries (SpaCy, NLTK, Gensim, Sumy). For stable execution, it is recommended to run this in an isolated virtual environment and ensure all SpaCy language models are downloaded prior to execution.

2. Setup & Installation

Clone the Repository:

git clone https://github.com/Amey-Thakur/TEXT-SUMMARIZER.git
cd TEXT-SUMMARIZER/Source\ Code

Install Dependencies:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

3. Launch Application

Run the Flask Server:
```
python app.py
```
Access the Interface:
- Open your browser and navigate to http://127.0.0.1:5000/.

Usage Guidelines

This repository is openly shared to support learning and knowledge exchange across the academic community.

For Students
Use this project as a reference for implementing NLP pipelines, understanding Flask web architecture, and integrating multiple machine learning libraries into a single application.

For Educators
This project may serve as a practical example or supplementary teaching resource for Natural Language Processing (DLO8012) and Computational Lab II (CSL804) as part of the 8th Semester Computer Engineering curriculum. Attribution is appreciated when utilizing content.

For Researchers
The comparative framework allows for the evaluation of different extractive summarization algorithms on custom datasets, providing a baseline for further research into abstractive methods.

License

This repository and all linked academic content are made available under the MIT License. See the LICENSE file for complete terms.

Summary: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original author.

About This Repository

Created & Maintained by: Amey Thakur & Mega Satish
Academic Journey: Bachelor of Engineering in Computer Engineering (2018-2022)
Institution: Terna Engineering College, Navi Mumbai
University: University of Mumbai

This project features the Text Summarizer, a utility developed as an 8th Semester Mini-Project. It represents a culmination of studies in computational linguistics and software engineering, delivering a functional tool for automated text analysis.

Connect: GitHub · LinkedIn · ORCID

Acknowledgments

Grateful acknowledgment to Mega Satish for her exceptional collaboration and scholarly partnership during the development of this project. Her intellectual contributions, technical insights, and dedicated commitment to software quality were fundamental in achieving the system's analytical and functional objectives. Learning alongside her was a transformative experience; her thoughtful approach to problem-solving and encouragement turned challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Mega, for everything you shared and taught along the way.

Grateful acknowledgment to the faculty members of the Department of Computer Engineering at Terna Engineering College for their guidance and instruction in Natural Language Processing. Their expertise in computational linguistics and algorithmic design helped shape the technical foundation of this project.

Special thanks to the mentors and peers whose encouragement, discussions, and support contributed meaningfully to this learning experience.

↑ Back to Top

Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments

🔬 Natural Language Processing Laboratory · 📝 Text Summarizer

Presented as part of the 8th Semester Mini-Project @ Terna Engineering College

🎓 Computer Engineering Repository

Computer Engineering (B.E.) - University of Mumbai

Semester-wise curriculum, laboratories, projects, and academic notes.