Spaces:
Sleeping
Sleeping
Upload summarymaker files incluidng src, examples, etc
Browse files- assets/flask_gui.png +0 -0
- assets/gradio_gui.png +0 -0
- assets/gradio_gui_2.png +0 -0
- examples/test.txt +13 -0
- examples/test_article.md +21 -0
- examples/test_article.txt +11 -0
- src/summarizer/__init__.py +1 -0
- src/summarizer/cli.py +42 -0
- src/summarizer/summarizer.py +23 -0
- src/summarizer/tests.ipynb +0 -0
- src/summarizer/utils.py +82 -0
- src/summarizer/webapp/app.py +80 -0
- src/summarizer/webapp/app.py.bak +62 -0
- src/summarizer/webapp/app.py.bak2 +85 -0
- src/summarizer/webapp/gradio_app.py +74 -0
- src/summarizer/webapp/templates/index.html +90 -0
- src/summarizer/webapp/templates/index.html.bak +87 -0
- src/summarizer/webapp/templates/index.html.bak2 +85 -0
- tests/__init__.py +1 -0
- tests/conftest.py +19 -0
- tests/test_cli.py +67 -0
- tests/test_example.py +4 -0
- tests/test_summarizer.py +66 -0
- tests/test_utils.py +55 -0
assets/flask_gui.png
ADDED
|
assets/gradio_gui.png
ADDED
|
assets/gradio_gui_2.png
ADDED
|
examples/test.txt
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This is a test article. It contains multiple sentences that we want to summarize. The text should be long enough to generate a meaningful summary.
|
| 2 |
+
|
| 3 |
+
Text summarization is an important task in natural language processing (NLP). It involves the creation of a shortened version of a text document while preserving its essential information and overall meaning. There are two main types of text summarization: extractive and abstractive.
|
| 4 |
+
|
| 5 |
+
Extractive summarization involves selecting key sentences or phrases directly from the original text and combining them to form a summary. This method relies on identifying the most important parts of the text and is relatively straightforward to implement. However, the resulting summary may not always be coherent or flow naturally, as it is simply a collection of extracted sentences.
|
| 6 |
+
|
| 7 |
+
On the other hand, abstractive summarization generates new sentences that convey the main ideas of the original text. This method requires a deeper understanding of the text and the ability to generate natural language that captures the essence of the content. Abstractive summarization is more challenging but can produce more cohesive and readable summaries.
|
| 8 |
+
|
| 9 |
+
In recent years, advancements in machine learning and deep learning have significantly improved the performance of text summarization models. Transformer-based models, such as BERT, GPT-3, and T5, have demonstrated remarkable capabilities in generating high-quality summaries. These models are trained on large datasets and leverage attention mechanisms to understand the context and relationships between words in a text.
|
| 10 |
+
|
| 11 |
+
Despite these advancements, text summarization remains a complex task, with challenges such as handling long documents, maintaining factual accuracy, and avoiding redundancy. Researchers continue to explore new techniques and approaches to address these challenges and enhance the effectiveness of summarization systems.
|
| 12 |
+
|
| 13 |
+
Overall, text summarization has a wide range of applications, including news aggregation, content curation, document summarization, and more. As technology continues to evolve, we can expect further improvements in the quality and efficiency of summarization methods, making it easier to distill valuable information from vast amounts of text.
|
examples/test_article.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
The Rise of Artificial Intelligence in Healthcare
|
| 2 |
+
|
| 3 |
+
Artificial intelligence has emerged as a transformative force in modern healthcare, revolutionizing everything from diagnostic procedures to patient care management. In recent years, healthcare providers and institutions worldwide have increasingly adopted AI-powered solutions to enhance their services and improve patient outcomes. The integration of AI technologies has not only streamlined administrative tasks but has also enabled more accurate disease detection and personalized treatment plans.
|
| 4 |
+
|
| 5 |
+
One of the most significant applications of AI in healthcare is in medical imaging analysis. Machine learning algorithms can now process X-rays, MRIs, and CT scans with remarkable accuracy, often detecting subtle abnormalities that human radiologists might miss. These AI systems have been particularly successful in identifying early signs of cancer, cardiovascular diseases, and neurological disorders. For example, studies have shown that AI-powered mammogram analysis can detect breast cancer with an accuracy rate comparable to, and sometimes exceeding, that of experienced radiologists.
|
| 6 |
+
|
| 7 |
+
The implementation of AI in predictive healthcare has also shown promising results. By analyzing vast amounts of patient data, AI systems can identify patterns and risk factors that might indicate potential health issues before they become severe. This predictive capability allows healthcare providers to intervene early, potentially preventing serious medical conditions and reducing the overall cost of healthcare. Hospitals using these systems have reported significant improvements in patient outcomes and reductions in readmission rates.
|
| 8 |
+
|
| 9 |
+
Electronic health records (EHRs) have been another area where AI has made substantial contributions. Natural language processing algorithms can now efficiently parse through thousands of medical records, extracting relevant information and identifying patterns that might be clinically significant. This capability has not only improved the quality of patient care but has also facilitated medical research by making vast amounts of clinical data more accessible and analyzable.
|
| 10 |
+
|
| 11 |
+
In the pharmaceutical industry, AI has accelerated the drug discovery process dramatically. Machine learning models can analyze molecular structures and predict their potential therapeutic effects, significantly reducing the time and cost associated with developing new medications. This has been particularly evident during global health crises, where AI-powered systems have helped identify potential treatments by analyzing existing drugs for new applications.
|
| 12 |
+
|
| 13 |
+
Despite these advancements, the integration of AI in healthcare faces several challenges. Privacy concerns regarding patient data, the need for regulatory frameworks, and questions about the reliability of AI systems in critical medical decisions remain important issues to address. Healthcare providers must also invest in training their staff to work effectively alongside AI systems, ensuring that these technologies enhance rather than replace human medical expertise.
|
| 14 |
+
|
| 15 |
+
The economic implications of AI in healthcare are substantial. While the initial investment in AI technologies can be significant, the long-term benefits often justify the cost. Improved efficiency, reduced medical errors, and better patient outcomes can lead to significant cost savings for healthcare institutions. Studies suggest that AI applications in healthcare could result in annual savings of billions of dollars across the industry.
|
| 16 |
+
|
| 17 |
+
Looking ahead, the role of AI in healthcare is expected to expand further. Emerging technologies like quantum computing could enhance AI capabilities, enabling even more sophisticated medical applications. Personalized medicine, powered by AI analysis of genetic and environmental factors, could become the standard approach to treatment. Additionally, AI-powered robotic surgery systems continue to evolve, promising greater precision and improved outcomes in surgical procedures.
|
| 18 |
+
|
| 19 |
+
Human oversight remains crucial in the implementation of AI in healthcare. While these systems can process vast amounts of data and identify patterns more efficiently than humans, medical professionals must ultimately make the final decisions regarding patient care. This partnership between human expertise and artificial intelligence represents the future of healthcare, where technology enhances rather than replaces the critical role of healthcare providers.
|
| 20 |
+
|
| 21 |
+
As we move forward, continued research and development in AI healthcare applications will likely reveal new possibilities for improving patient care. The key to successful implementation lies in striking the right balance between technological innovation and human medical expertise, ensuring that AI serves as a tool to enhance healthcare delivery while maintaining the essential human element in medical care.
|
examples/test_article.txt
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
The Rise of Artificial Intelligence in Healthcare
|
| 2 |
+
Artificial intelligence has emerged as a transformative force in modern healthcare, revolutionizing everything from diagnostic procedures to patient care management. In recent years, healthcare providers and institutions worldwide have increasingly adopted AI-powered solutions to enhance their services and improve patient outcomes. The integration of AI technologies has not only streamlined administrative tasks but has also enabled more accurate disease detection and personalized treatment plans.
|
| 3 |
+
One of the most significant applications of AI in healthcare is in medical imaging analysis. Machine learning algorithms can now process X-rays, MRIs, and CT scans with remarkable accuracy, often detecting subtle abnormalities that human radiologists might miss. These AI systems have been particularly successful in identifying early signs of cancer, cardiovascular diseases, and neurological disorders. For example, studies have shown that AI-powered mammogram analysis can detect breast cancer with an accuracy rate comparable to, and sometimes exceeding, that of experienced radiologists.
|
| 4 |
+
The implementation of AI in predictive healthcare has also shown promising results. By analyzing vast amounts of patient data, AI systems can identify patterns and risk factors that might indicate potential health issues before they become severe. This predictive capability allows healthcare providers to intervene early, potentially preventing serious medical conditions and reducing the overall cost of healthcare. Hospitals using these systems have reported significant improvements in patient outcomes and reductions in readmission rates.
|
| 5 |
+
Electronic health records (EHRs) have been another area where AI has made substantial contributions. Natural language processing algorithms can now efficiently parse through thousands of medical records, extracting relevant information and identifying patterns that might be clinically significant. This capability has not only improved the quality of patient care but has also facilitated medical research by making vast amounts of clinical data more accessible and analyzable.
|
| 6 |
+
In the pharmaceutical industry, AI has accelerated the drug discovery process dramatically. Machine learning models can analyze molecular structures and predict their potential therapeutic effects, significantly reducing the time and cost associated with developing new medications. This has been particularly evident during global health crises, where AI-powered systems have helped identify potential treatments by analyzing existing drugs for new applications.
|
| 7 |
+
Despite these advancements, the integration of AI in healthcare faces several challenges. Privacy concerns regarding patient data, the need for regulatory frameworks, and questions about the reliability of AI systems in critical medical decisions remain important issues to address. Healthcare providers must also invest in training their staff to work effectively alongside AI systems, ensuring that these technologies enhance rather than replace human medical expertise.
|
| 8 |
+
The economic implications of AI in healthcare are substantial. While the initial investment in AI technologies can be significant, the long-term benefits often justify the cost. Improved efficiency, reduced medical errors, and better patient outcomes can lead to significant cost savings for healthcare institutions. Studies suggest that AI applications in healthcare could result in annual savings of billions of dollars across the industry.
|
| 9 |
+
Looking ahead, the role of AI in healthcare is expected to expand further. Emerging technologies like quantum computing could enhance AI capabilities, enabling even more sophisticated medical applications. Personalized medicine, powered by AI analysis of genetic and environmental factors, could become the standard approach to treatment. Additionally, AI-powered robotic surgery systems continue to evolve, promising greater precision and improved outcomes in surgical procedures.
|
| 10 |
+
Human oversight remains crucial in the implementation of AI in healthcare. While these systems can process vast amounts of data and identify patterns more efficiently than humans, medical professionals must ultimately make the final decisions regarding patient care. This partnership between human expertise and artificial intelligence represents the future of healthcare, where technology enhances rather than replaces the critical role of healthcare providers.
|
| 11 |
+
As we move forward, continued research and development in AI healthcare applications will likely reveal new possibilities for improving patient care. The key to successful implementation lies in striking the right balance between technological innovation and human medical expertise, ensuring that AI serves as a tool to enhance healthcare delivery while maintaining the essential human element in medical care.
|
src/summarizer/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Text summarization package."""
|
src/summarizer/cli.py
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import click
|
| 2 |
+
from .summarizer import process_text
|
| 3 |
+
from .utils import extract_from_url, read_file
|
| 4 |
+
import warnings
|
| 5 |
+
|
| 6 |
+
#warnings.filterwarnings("ignore")
|
| 7 |
+
#warnings.filterwarnings("ignore", module="torch")
|
| 8 |
+
#warnings.filterwarnings("ignore", module="numpy")
|
| 9 |
+
|
| 10 |
+
@click.command()
|
| 11 |
+
@click.option('--url', help='URL to extract text from')
|
| 12 |
+
@click.option('--file', help='Text file path to summarize', type=click.Path(exists=True))
|
| 13 |
+
@click.option('--model', default='t5-base', help='Transformer model to use')
|
| 14 |
+
@click.option('--max-length', default=180, help='Maximum length of summary')
|
| 15 |
+
def main(url, file, model, max_length):
|
| 16 |
+
"""Summarize text from a URL or file."""
|
| 17 |
+
try:
|
| 18 |
+
if url:
|
| 19 |
+
click.echo(f"Fetching text from URL: {url}")
|
| 20 |
+
text = extract_from_url(url)
|
| 21 |
+
elif file:
|
| 22 |
+
click.echo(f"Reading file: {file}")
|
| 23 |
+
text = read_file(file)
|
| 24 |
+
else:
|
| 25 |
+
raise click.UsageError("Please provide either --url or --file")
|
| 26 |
+
|
| 27 |
+
if not text or len(text.strip()) < 50:
|
| 28 |
+
raise click.UsageError("Not enough text content to summarize")
|
| 29 |
+
|
| 30 |
+
click.echo("Starting summarization process...")
|
| 31 |
+
summary = process_text(text, model=model, max_length=max_length)
|
| 32 |
+
click.echo("\nSummary:")
|
| 33 |
+
click.echo("=" * 80)
|
| 34 |
+
click.echo(summary)
|
| 35 |
+
click.echo("=" * 80)
|
| 36 |
+
|
| 37 |
+
except Exception as e:
|
| 38 |
+
click.echo(f"Error: {str(e)}", err=True)
|
| 39 |
+
raise click.Abort()
|
| 40 |
+
|
| 41 |
+
if __name__ == "__main__":
|
| 42 |
+
main()
|
src/summarizer/summarizer.py
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from transformers import pipeline
|
| 2 |
+
import os
|
| 3 |
+
|
| 4 |
+
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3"
|
| 5 |
+
|
| 6 |
+
def process_text(text, model="t5-base", max_length=180):
|
| 7 |
+
"""
|
| 8 |
+
Process and summarize the input text.
|
| 9 |
+
|
| 10 |
+
Args:
|
| 11 |
+
text (str): Input text to summarize
|
| 12 |
+
model (str): Name of the transformer model to use
|
| 13 |
+
max_length (int): Maximum length of the summary
|
| 14 |
+
|
| 15 |
+
Returns:
|
| 16 |
+
str: Summarized text
|
| 17 |
+
"""
|
| 18 |
+
try:
|
| 19 |
+
summarizer = pipeline("summarization", model=model)
|
| 20 |
+
result = summarizer(text, max_length=max_length)
|
| 21 |
+
return result[0]["summary_text"]
|
| 22 |
+
except Exception as e:
|
| 23 |
+
raise Exception(f"Summarization failed: {str(e)}")
|
src/summarizer/tests.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
src/summarizer/utils.py
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import requests
|
| 2 |
+
from bs4 import BeautifulSoup
|
| 3 |
+
import time
|
| 4 |
+
|
| 5 |
+
def read_file(file_path):
|
| 6 |
+
"""
|
| 7 |
+
Read text content from a file.
|
| 8 |
+
|
| 9 |
+
Args:
|
| 10 |
+
file_path (str): Path to the text file
|
| 11 |
+
|
| 12 |
+
Returns:
|
| 13 |
+
str: File content
|
| 14 |
+
"""
|
| 15 |
+
try:
|
| 16 |
+
with open(file_path, 'r', encoding='utf-8') as f:
|
| 17 |
+
content = f.read().strip()
|
| 18 |
+
if not content:
|
| 19 |
+
raise Exception("File is empty")
|
| 20 |
+
return content
|
| 21 |
+
except UnicodeDecodeError:
|
| 22 |
+
# Try with different encodings if utf-8 fails
|
| 23 |
+
try:
|
| 24 |
+
with open(file_path, 'r', encoding='latin-1') as f:
|
| 25 |
+
content = f.read().strip()
|
| 26 |
+
if not content:
|
| 27 |
+
raise Exception("File is empty")
|
| 28 |
+
return content
|
| 29 |
+
except Exception as e:
|
| 30 |
+
raise Exception(f"Failed to read file with alternative encoding: {str(e)}")
|
| 31 |
+
except Exception as e:
|
| 32 |
+
raise Exception(f"File reading failed: {str(e)}")
|
| 33 |
+
|
| 34 |
+
def extract_from_url(url):
|
| 35 |
+
"""
|
| 36 |
+
Extract text content from a URL.
|
| 37 |
+
|
| 38 |
+
Args:
|
| 39 |
+
url (str): URL to extract text from
|
| 40 |
+
|
| 41 |
+
Returns:
|
| 42 |
+
str: Extracted text content
|
| 43 |
+
"""
|
| 44 |
+
try:
|
| 45 |
+
headers = {
|
| 46 |
+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
# Add retry mechanism
|
| 50 |
+
max_retries = 3
|
| 51 |
+
for attempt in range(max_retries):
|
| 52 |
+
try:
|
| 53 |
+
response = requests.get(url, headers=headers, timeout=10)
|
| 54 |
+
response.raise_for_status()
|
| 55 |
+
break
|
| 56 |
+
except requests.RequestException as e:
|
| 57 |
+
if attempt == max_retries - 1:
|
| 58 |
+
raise
|
| 59 |
+
time.sleep(1)
|
| 60 |
+
|
| 61 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
| 62 |
+
# Try to get text from articles first
|
| 63 |
+
article_text = ""
|
| 64 |
+
articles = soup.find_all(['article', 'main'])
|
| 65 |
+
if articles:
|
| 66 |
+
for article in articles:
|
| 67 |
+
paragraphs = article.find_all("p")
|
| 68 |
+
article_text += " ".join(p.text.strip() for p in paragraphs if p.text.strip())
|
| 69 |
+
|
| 70 |
+
# If no article text found, fall back to all paragraphs
|
| 71 |
+
if not article_text:
|
| 72 |
+
paragraphs = soup.find_all("p")
|
| 73 |
+
article_text = " ".join(p.text.strip() for p in paragraphs if p.text.strip())
|
| 74 |
+
|
| 75 |
+
if not article_text:
|
| 76 |
+
raise Exception("No text content found on the page")
|
| 77 |
+
|
| 78 |
+
return article_text
|
| 79 |
+
except requests.RequestException as e:
|
| 80 |
+
raise Exception(f"Failed to fetch URL: {str(e)}")
|
| 81 |
+
except Exception as e:
|
| 82 |
+
raise Exception(f"URL extraction failed: {str(e)}")
|
src/summarizer/webapp/app.py
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from flask import Flask, request, render_template
|
| 2 |
+
from summarizer.summarizer import process_text # Adjust import path
|
| 3 |
+
from summarizer.utils import extract_from_url, read_file # Adjust import path
|
| 4 |
+
import logging
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
# Set up logging
|
| 8 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 9 |
+
|
| 10 |
+
app = Flask(__name__)
|
| 11 |
+
|
| 12 |
+
# Limit file upload size to 1 MB
|
| 13 |
+
app.config['MAX_CONTENT_LENGTH'] = 1 * 1024 * 1024
|
| 14 |
+
|
| 15 |
+
@app.route('/')
|
| 16 |
+
def index():
|
| 17 |
+
# Render the template with an empty summary by default
|
| 18 |
+
return render_template('index.html', summary="")
|
| 19 |
+
|
| 20 |
+
@app.route('/summarize', methods=['POST'])
|
| 21 |
+
def summarize():
|
| 22 |
+
try:
|
| 23 |
+
choice = request.form.get('choice')
|
| 24 |
+
url = request.form.get('url')
|
| 25 |
+
file = request.files.get('file')
|
| 26 |
+
text = request.form.get('text')
|
| 27 |
+
model = request.form.get('model') or 't5-base'
|
| 28 |
+
max_length = request.form.get('max_length')
|
| 29 |
+
|
| 30 |
+
# Validate max_length
|
| 31 |
+
try:
|
| 32 |
+
max_length = int(max_length) if max_length else 180
|
| 33 |
+
if max_length <= 0:
|
| 34 |
+
raise ValueError("Max length must be positive.")
|
| 35 |
+
except ValueError:
|
| 36 |
+
return render_template('index.html', error="Invalid maximum length", summary="")
|
| 37 |
+
|
| 38 |
+
# Ensure only one input is provided
|
| 39 |
+
if (choice == 'url' and not url) or (choice == 'file' and not file) or (choice == 'text' and not text):
|
| 40 |
+
return render_template('index.html', error="Please provide the selected input type.", summary="")
|
| 41 |
+
|
| 42 |
+
input_text = ""
|
| 43 |
+
if choice == 'url':
|
| 44 |
+
if not url.startswith(('http://', 'https://')):
|
| 45 |
+
return render_template('index.html', error="Invalid URL format.", summary="")
|
| 46 |
+
try:
|
| 47 |
+
input_text = extract_from_url(url)
|
| 48 |
+
except Exception as e:
|
| 49 |
+
logging.error(f"URL extraction failed: {str(e)}")
|
| 50 |
+
return render_template('index.html', error="URL extraction failed.", summary="")
|
| 51 |
+
elif choice == 'file':
|
| 52 |
+
if not file.filename.endswith('.txt'):
|
| 53 |
+
return render_template('index.html', error="Only .txt files are supported.", summary="")
|
| 54 |
+
try:
|
| 55 |
+
input_text = file.read().decode('utf-8')
|
| 56 |
+
except Exception as e:
|
| 57 |
+
logging.error(f"File reading failed: {str(e)}")
|
| 58 |
+
return render_template('index.html', error="File reading failed.", summary="")
|
| 59 |
+
elif choice == 'text':
|
| 60 |
+
input_text = text
|
| 61 |
+
|
| 62 |
+
if not input_text or len(input_text.strip()) < 50:
|
| 63 |
+
return render_template('index.html', error="Not enough text content to summarize", summary="")
|
| 64 |
+
|
| 65 |
+
try:
|
| 66 |
+
summary = process_text(input_text, model=model, max_length=max_length)
|
| 67 |
+
except Exception as e:
|
| 68 |
+
logging.error(f"Summarization failed: {str(e)}")
|
| 69 |
+
return render_template('index.html', error="Summarization failed.", summary="")
|
| 70 |
+
|
| 71 |
+
return render_template('index.html', summary=summary, url=url, model=model, max_length=max_length, text=text)
|
| 72 |
+
|
| 73 |
+
except Exception as e:
|
| 74 |
+
logging.error(f"Unexpected error: {str(e)}")
|
| 75 |
+
return render_template('index.html', error="An unexpected error occurred.", summary="")
|
| 76 |
+
|
| 77 |
+
if __name__ == '__main__':
|
| 78 |
+
# Use a secure production-ready WSGI server for deployment, e.g., Gunicorn
|
| 79 |
+
#app.run(debug=True)
|
| 80 |
+
app.run(host="0.0.0.0", port=5000)
|
src/summarizer/webapp/app.py.bak
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from flask import Flask, request, render_template
|
| 2 |
+
from summarizer.summarizer import process_text # Adjust import path
|
| 3 |
+
from summarizer.utils import extract_from_url, read_file # Adjust import path
|
| 4 |
+
|
| 5 |
+
app = Flask(__name__)
|
| 6 |
+
|
| 7 |
+
@app.route('/')
|
| 8 |
+
def index():
|
| 9 |
+
return render_template('index.html')
|
| 10 |
+
|
| 11 |
+
@app.route('/summarize', methods=['POST'])
|
| 12 |
+
def summarize():
|
| 13 |
+
if request.method == 'POST':
|
| 14 |
+
choice = request.form.get('choice')
|
| 15 |
+
url = request.form.get('url')
|
| 16 |
+
file = request.files.get('file')
|
| 17 |
+
text = request.form.get('text')
|
| 18 |
+
model = request.form.get('model') or 't5-base'
|
| 19 |
+
max_length = request.form.get('max_length')
|
| 20 |
+
|
| 21 |
+
# Use default max_length if the field is empty
|
| 22 |
+
if not max_length:
|
| 23 |
+
max_length = 180
|
| 24 |
+
else:
|
| 25 |
+
# Convert max_length to integer if it's not empty
|
| 26 |
+
try:
|
| 27 |
+
max_length = int(max_length)
|
| 28 |
+
except ValueError:
|
| 29 |
+
return render_template('index.html', error="Invalid maximum length")
|
| 30 |
+
|
| 31 |
+
# Ensure only one input is provided based on the choice
|
| 32 |
+
if (choice == 'url' and not url) or (choice == 'file' and not file) or (choice == 'text' and not text):
|
| 33 |
+
return render_template('index.html', error="Please provide the selected input type.")
|
| 34 |
+
|
| 35 |
+
input_text = ""
|
| 36 |
+
if choice == 'url':
|
| 37 |
+
try:
|
| 38 |
+
input_text = extract_from_url(url)
|
| 39 |
+
except Exception as e:
|
| 40 |
+
return render_template('index.html', error=f"URL extraction failed: {str(e)}")
|
| 41 |
+
elif choice == 'file':
|
| 42 |
+
try:
|
| 43 |
+
input_text = file.read().decode('utf-8')
|
| 44 |
+
except Exception as e:
|
| 45 |
+
return render_template('index.html', error=f"File reading failed: {str(e)}")
|
| 46 |
+
elif choice == 'text':
|
| 47 |
+
input_text = text
|
| 48 |
+
|
| 49 |
+
if not input_text or len(input_text.strip()) < 50:
|
| 50 |
+
return render_template('index.html', error="Not enough text content to summarize")
|
| 51 |
+
|
| 52 |
+
try:
|
| 53 |
+
summary = process_text(input_text, model=model, max_length=max_length)
|
| 54 |
+
except Exception as e:
|
| 55 |
+
return render_template('index.html', error=f"Summarization failed: {str(e)}")
|
| 56 |
+
|
| 57 |
+
return render_template('index.html', summary=summary, url=url, model=model, max_length=max_length, text=text)
|
| 58 |
+
|
| 59 |
+
return render_template('index.html')
|
| 60 |
+
|
| 61 |
+
if __name__ == '__main__':
|
| 62 |
+
app.run(debug=True)
|
src/summarizer/webapp/app.py.bak2
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from flask import Flask, request, render_template
|
| 2 |
+
from summarizer.summarizer import process_text # Adjust import path
|
| 3 |
+
from summarizer.utils import extract_from_url, read_file # Adjust import path
|
| 4 |
+
import logging
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
# Set up logging
|
| 8 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 9 |
+
|
| 10 |
+
app = Flask(__name__)
|
| 11 |
+
|
| 12 |
+
# Limit file upload size to 1 MB
|
| 13 |
+
app.config['MAX_CONTENT_LENGTH'] = 1 * 1024 * 1024
|
| 14 |
+
|
| 15 |
+
@app.route('/')
|
| 16 |
+
def index():
|
| 17 |
+
return render_template('index.html')
|
| 18 |
+
|
| 19 |
+
@app.route('/summarize', methods=['POST'])
|
| 20 |
+
def summarize():
|
| 21 |
+
try:
|
| 22 |
+
choice = request.form.get('choice')
|
| 23 |
+
url = request.form.get('url')
|
| 24 |
+
file = request.files.get('file')
|
| 25 |
+
text = request.form.get('text')
|
| 26 |
+
model = request.form.get('model') or 't5-base'
|
| 27 |
+
max_length = request.form.get('max_length')
|
| 28 |
+
|
| 29 |
+
# Validate max_length
|
| 30 |
+
try:
|
| 31 |
+
max_length = int(max_length) if max_length else 180
|
| 32 |
+
if max_length <= 0:
|
| 33 |
+
raise ValueError("Max length must be positive.")
|
| 34 |
+
except ValueError:
|
| 35 |
+
return render_template('index.html', error="Invalid maximum length")
|
| 36 |
+
|
| 37 |
+
# Ensure only one input is provided
|
| 38 |
+
if (choice == 'url' and not url) or (choice == 'file' and not file) or (choice == 'text' and not text):
|
| 39 |
+
return render_template('index.html', error="Please provide the selected input type.")
|
| 40 |
+
|
| 41 |
+
input_text = ""
|
| 42 |
+
if choice == 'url':
|
| 43 |
+
if not url.startswith(('http://', 'https://')):
|
| 44 |
+
return render_template('index.html', error="Invalid URL format.")
|
| 45 |
+
try:
|
| 46 |
+
input_text = extract_from_url(url)
|
| 47 |
+
except Exception as e:
|
| 48 |
+
logging.error(f"URL extraction failed: {str(e)}")
|
| 49 |
+
return render_template('index.html', error="URL extraction failed.")
|
| 50 |
+
elif choice == 'file':
|
| 51 |
+
if not file.filename.endswith('.txt'):
|
| 52 |
+
return render_template('index.html', error="Only .txt files are supported.")
|
| 53 |
+
try:
|
| 54 |
+
input_text = file.read().decode('utf-8')
|
| 55 |
+
except Exception as e:
|
| 56 |
+
logging.error(f"File reading failed: {str(e)}")
|
| 57 |
+
return render_template('index.html', error="File reading failed.")
|
| 58 |
+
elif choice == 'text':
|
| 59 |
+
input_text = text
|
| 60 |
+
|
| 61 |
+
if not input_text or len(input_text.strip()) < 50:
|
| 62 |
+
return render_template('index.html', error="Not enough text content to summarize")
|
| 63 |
+
|
| 64 |
+
try:
|
| 65 |
+
summary = process_text(input_text, model=model, max_length=max_length)
|
| 66 |
+
except Exception as e:
|
| 67 |
+
logging.error(f"Summarization failed: {str(e)}")
|
| 68 |
+
return render_template('index.html', error="Summarization failed.")
|
| 69 |
+
|
| 70 |
+
return render_template('index.html', summary=summary, url=url, model=model, max_length=max_length, text=text)
|
| 71 |
+
|
| 72 |
+
except Exception as e:
|
| 73 |
+
logging.error(f"Unexpected error: {str(e)}")
|
| 74 |
+
return render_template('index.html', error="An unexpected error occurred.")
|
| 75 |
+
|
| 76 |
+
if __name__ == '__main__':
|
| 77 |
+
# Use a secure production-ready WSGI server for deployment, e.g., Gunicorn
|
| 78 |
+
app.run(debug=True)
|
| 79 |
+
|
| 80 |
+
# Updated HTML Template (index.html):
|
| 81 |
+
# 1. Provide a dropdown menu for model selection.
|
| 82 |
+
# 2. Style the page for better UX.
|
| 83 |
+
# 3. Ensure accessibility improvements with ARIA roles and labels.
|
| 84 |
+
|
| 85 |
+
# Note: Additional details for index.html updates can be provided upon request.
|
src/summarizer/webapp/gradio_app.py
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import gradio as gr
|
| 3 |
+
from summarizer.summarizer import process_text # Adjust import path
|
| 4 |
+
#from summarizer import process_text # Adjust import path
|
| 5 |
+
from summarizer.utils import extract_from_url # Adjust import path
|
| 6 |
+
#from utils import extract_from_url # Adjust import path
|
| 7 |
+
|
| 8 |
+
# Set the Gradio temporary directory
|
| 9 |
+
os.environ['GRADIO_TEMP_DIR'] = os.path.expanduser('~/.gradio_tmp')
|
| 10 |
+
|
| 11 |
+
# Create the temporary directory if it does not exist
|
| 12 |
+
os.makedirs(os.environ['GRADIO_TEMP_DIR'], exist_ok=True)
|
| 13 |
+
|
| 14 |
+
def summarize_text(choice, url, file_path, text, model_name, max_length):
|
| 15 |
+
input_text = ""
|
| 16 |
+
if choice == "URL":
|
| 17 |
+
try:
|
| 18 |
+
input_text = extract_from_url(url)
|
| 19 |
+
except Exception as e:
|
| 20 |
+
return f"URL extraction failed: {str(e)}"
|
| 21 |
+
elif choice == "File":
|
| 22 |
+
if file_path is not None:
|
| 23 |
+
try:
|
| 24 |
+
with open(file_path.name, 'r', encoding='utf-8') as f:
|
| 25 |
+
input_text = f.read()
|
| 26 |
+
except Exception as e:
|
| 27 |
+
return f"File reading failed: {str(e)}"
|
| 28 |
+
else:
|
| 29 |
+
return "File reading failed: No file uploaded"
|
| 30 |
+
elif choice == "Text":
|
| 31 |
+
input_text = text
|
| 32 |
+
|
| 33 |
+
if not input_text or len(input_text.strip()) < 50:
|
| 34 |
+
return "Not enough text content to summarize"
|
| 35 |
+
|
| 36 |
+
try:
|
| 37 |
+
summary = process_text(input_text, model=model_name, max_length=max_length)
|
| 38 |
+
return summary
|
| 39 |
+
except Exception as e:
|
| 40 |
+
return f"Summarization failed: {str(e)}"
|
| 41 |
+
|
| 42 |
+
def update_visibility(choice):
|
| 43 |
+
return (
|
| 44 |
+
gr.update(visible=(choice == "URL"), value=""),
|
| 45 |
+
gr.update(visible=(choice == "File"), value=None),
|
| 46 |
+
gr.update(visible=(choice == "Text"), value="")
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
def main():
|
| 50 |
+
choices = ["Text", "URL", "File"]
|
| 51 |
+
with gr.Blocks() as demo:
|
| 52 |
+
gr.Markdown("# SummaryMaker") # Add title here
|
| 53 |
+
choice = gr.Dropdown(choices, label="Choose input text type", value="Text")
|
| 54 |
+
url = gr.Textbox(label="URL to Summarize", visible=False)
|
| 55 |
+
file = gr.File(label="Upload File", visible=False)
|
| 56 |
+
text = gr.Textbox(label="Text to Summarize", lines=10, visible=True) # Visible by default
|
| 57 |
+
model = gr.Textbox(label="Model", value="t5-base")
|
| 58 |
+
max_length = gr.Slider(label="Max Length", minimum=50, maximum=500, value=180, step=10)
|
| 59 |
+
summary = gr.Textbox(label="Summary")
|
| 60 |
+
|
| 61 |
+
choice.change(fn=update_visibility, inputs=choice, outputs=[url, file, text])
|
| 62 |
+
|
| 63 |
+
gr.Button("Summarize").click(
|
| 64 |
+
summarize_text,
|
| 65 |
+
inputs=[choice, url, file, text, model, max_length],
|
| 66 |
+
outputs=[summary]
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
#demo.launch()
|
| 70 |
+
# Ensure the Gradio app binds to '0.0.0.0' to be accessible from outside the container
|
| 71 |
+
demo.launch(server_name="0.0.0.0", server_port=7860)
|
| 72 |
+
|
| 73 |
+
if __name__ == "__main__":
|
| 74 |
+
main()
|
src/summarizer/webapp/templates/index.html
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>SummaryMaker</title>
|
| 7 |
+
<script>
|
| 8 |
+
function toggleInput() {
|
| 9 |
+
const choice = document.getElementById('choice').value;
|
| 10 |
+
const urlInput = document.getElementById('urlInput');
|
| 11 |
+
const fileInput = document.getElementById('fileInput');
|
| 12 |
+
const textInput = document.getElementById('textInput');
|
| 13 |
+
|
| 14 |
+
// Clear the summary box when a new input type is selected
|
| 15 |
+
document.getElementById('summary').value = '';
|
| 16 |
+
|
| 17 |
+
if (choice === 'url') {
|
| 18 |
+
urlInput.style.display = 'block';
|
| 19 |
+
fileInput.style.display = 'none';
|
| 20 |
+
textInput.style.display = 'none';
|
| 21 |
+
document.getElementById('file').value = "";
|
| 22 |
+
document.getElementById('text').value = "";
|
| 23 |
+
} else if (choice === 'file') {
|
| 24 |
+
urlInput.style.display = 'none';
|
| 25 |
+
fileInput.style.display = 'block';
|
| 26 |
+
textInput.style.display = 'none';
|
| 27 |
+
document.getElementById('url').value = "";
|
| 28 |
+
document.getElementById('text').value = "";
|
| 29 |
+
} else if (choice === 'text') {
|
| 30 |
+
urlInput.style.display = 'none';
|
| 31 |
+
fileInput.style.display = 'none';
|
| 32 |
+
textInput.style.display = 'block';
|
| 33 |
+
document.getElementById('url').value = "";
|
| 34 |
+
document.getElementById('file').value = "";
|
| 35 |
+
} else {
|
| 36 |
+
urlInput.style.display = 'none';
|
| 37 |
+
fileInput.style.display = 'none';
|
| 38 |
+
textInput.style.display = 'none';
|
| 39 |
+
}
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
function clearSummary() {
|
| 44 |
+
document.getElementById('summary').value = '';
|
| 45 |
+
}
|
| 46 |
+
</script>
|
| 47 |
+
</head>
|
| 48 |
+
<body>
|
| 49 |
+
<h1>SummaryMaker</h1>
|
| 50 |
+
<form action="/summarize" method="post" enctype="multipart/form-data" onsubmit="clearSummary()">
|
| 51 |
+
<label for="choice">Choose input text type:</label><br>
|
| 52 |
+
<select id="choice" name="choice" onchange="toggleInput()">
|
| 53 |
+
<option value="">--Select--</option>
|
| 54 |
+
<option value="url">URL</option>
|
| 55 |
+
<option value="file">File</option>
|
| 56 |
+
<option value="text">Text</option>
|
| 57 |
+
</select><br><br>
|
| 58 |
+
|
| 59 |
+
<div id="urlInput" style="display: none;">
|
| 60 |
+
<label for="url">URL to Summarize:</label><br>
|
| 61 |
+
<input type="text" name="url" id="url" value="{{ url }}"><br><br>
|
| 62 |
+
</div>
|
| 63 |
+
|
| 64 |
+
<div id="fileInput" style="display: none;">
|
| 65 |
+
<label for="file">Upload File:</label><br>
|
| 66 |
+
<input type="file" name="file" id="file"><br><br>
|
| 67 |
+
</div>
|
| 68 |
+
|
| 69 |
+
<div id="textInput" style="display: none;">
|
| 70 |
+
<label for="text">Text to Summarize:</label><br>
|
| 71 |
+
<textarea name="text" id="text" rows="10" cols="50">{{ text }}</textarea><br><br>
|
| 72 |
+
</div>
|
| 73 |
+
|
| 74 |
+
<label for="model">Model:</label>
|
| 75 |
+
<input type="text" name="model" id="model" value="{{ model or 't5-base' }}"><br>
|
| 76 |
+
|
| 77 |
+
<label for="max_length">Max Length:</label>
|
| 78 |
+
<input type="number" name="max_length" id="max_length" value="{{ max_length or 180 }}"><br><br>
|
| 79 |
+
|
| 80 |
+
<input type="submit" value="Summarize">
|
| 81 |
+
</form>
|
| 82 |
+
{% if error %}
|
| 83 |
+
<p style="color: red;">{{ error }}</p>
|
| 84 |
+
{% endif %}
|
| 85 |
+
<div>
|
| 86 |
+
<h2>Summary:</h2>
|
| 87 |
+
<textarea id="summary" rows="10" cols="50" readonly>{{ summary }}</textarea>
|
| 88 |
+
</div>
|
| 89 |
+
</body>
|
| 90 |
+
</html>
|
src/summarizer/webapp/templates/index.html.bak
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>SummaryMaker</title>
|
| 7 |
+
<script>
|
| 8 |
+
function toggleInput() {
|
| 9 |
+
const choice = document.getElementById('choice').value;
|
| 10 |
+
const urlInput = document.getElementById('urlInput');
|
| 11 |
+
const fileInput = document.getElementById('fileInput');
|
| 12 |
+
const textInput = document.getElementById('textInput');
|
| 13 |
+
if (choice === 'url') {
|
| 14 |
+
urlInput.style.display = 'block';
|
| 15 |
+
fileInput.style.display = 'none';
|
| 16 |
+
textInput.style.display = 'none';
|
| 17 |
+
document.getElementById('file').value = "";
|
| 18 |
+
document.getElementById('text').value = "";
|
| 19 |
+
} else if (choice === 'file') {
|
| 20 |
+
urlInput.style.display = 'none';
|
| 21 |
+
fileInput.style.display = 'block';
|
| 22 |
+
textInput.style.display = 'none';
|
| 23 |
+
document.getElementById('url').value = "";
|
| 24 |
+
document.getElementById('text').value = "";
|
| 25 |
+
} else if (choice === 'text') {
|
| 26 |
+
urlInput.style.display = 'none';
|
| 27 |
+
fileInput.style.display = 'none';
|
| 28 |
+
textInput.style.display = 'block';
|
| 29 |
+
document.getElementById('url').value = "";
|
| 30 |
+
document.getElementById('file').value = "";
|
| 31 |
+
} else {
|
| 32 |
+
urlInput.style.display = 'none';
|
| 33 |
+
fileInput.style.display = 'none';
|
| 34 |
+
textInput.style.display = 'none';
|
| 35 |
+
}
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
function clearSummary() {
|
| 39 |
+
document.getElementById('summary').value = '';
|
| 40 |
+
}
|
| 41 |
+
</script>
|
| 42 |
+
</head>
|
| 43 |
+
<body>
|
| 44 |
+
<h1>SummaryMaker</h1>
|
| 45 |
+
<form action="/summarize" method="post" enctype="multipart/form-data" onsubmit="clearSummary()">
|
| 46 |
+
<label for="choice">Choose input text type:</label><br>
|
| 47 |
+
<select id="choice" name="choice" onchange="toggleInput()">
|
| 48 |
+
<option value="">--Select--</option>
|
| 49 |
+
<option value="url">URL</option>
|
| 50 |
+
<option value="file">File</option>
|
| 51 |
+
<option value="text">Text</option>
|
| 52 |
+
</select><br><br>
|
| 53 |
+
|
| 54 |
+
<div id="urlInput" style="display: none;">
|
| 55 |
+
<label for="url">URL to Summarize:</label><br>
|
| 56 |
+
<input type="text" name="url" id="url" value="{{ url }}"><br><br>
|
| 57 |
+
</div>
|
| 58 |
+
|
| 59 |
+
<div id="fileInput" style="display: none;">
|
| 60 |
+
<label for="file">Upload File:</label><br>
|
| 61 |
+
<input type="file" name="file" id="file"><br><br>
|
| 62 |
+
</div>
|
| 63 |
+
|
| 64 |
+
<div id="textInput" style="display: none;">
|
| 65 |
+
<label for="text">Text to Summarize:</label><br>
|
| 66 |
+
<textarea name="text" id="text" rows="10" cols="50">{{ text }}</textarea><br><br>
|
| 67 |
+
</div>
|
| 68 |
+
|
| 69 |
+
<label for="model">Model:</label>
|
| 70 |
+
<input type="text" name="model" id="model" value="{{ model or 't5-base' }}"><br>
|
| 71 |
+
|
| 72 |
+
<label for="max_length">Max Length:</label>
|
| 73 |
+
<input type="number" name="max_length" id="max_length" value="{{ max_length or 180 }}"><br><br>
|
| 74 |
+
|
| 75 |
+
<input type="submit" value="Summarize">
|
| 76 |
+
</form>
|
| 77 |
+
{% if error %}
|
| 78 |
+
<p style="color: red;">{{ error }}</p>
|
| 79 |
+
{% endif %}
|
| 80 |
+
<div>
|
| 81 |
+
{% if summary %}
|
| 82 |
+
<h2>Summary:</h2>
|
| 83 |
+
<textarea id="summary" rows="10" cols="50" readonly>{{ summary }}</textarea>
|
| 84 |
+
{% endif %}
|
| 85 |
+
</div>
|
| 86 |
+
</body>
|
| 87 |
+
</html>
|
src/summarizer/webapp/templates/index.html.bak2
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>SummaryMaker</title>
|
| 7 |
+
<script>
|
| 8 |
+
function toggleInput() {
|
| 9 |
+
const choice = document.getElementById('choice').value;
|
| 10 |
+
const urlInput = document.getElementById('urlInput');
|
| 11 |
+
const fileInput = document.getElementById('fileInput');
|
| 12 |
+
const textInput = document.getElementById('textInput');
|
| 13 |
+
if (choice === 'url') {
|
| 14 |
+
urlInput.style.display = 'block';
|
| 15 |
+
fileInput.style.display = 'none';
|
| 16 |
+
textInput.style.display = 'none';
|
| 17 |
+
document.getElementById('file').value = "";
|
| 18 |
+
document.getElementById('text').value = "";
|
| 19 |
+
} else if (choice === 'file') {
|
| 20 |
+
urlInput.style.display = 'none';
|
| 21 |
+
fileInput.style.display = 'block';
|
| 22 |
+
textInput.style.display = 'none';
|
| 23 |
+
document.getElementById('url').value = "";
|
| 24 |
+
document.getElementById('text').value = "";
|
| 25 |
+
} else if (choice === 'text') {
|
| 26 |
+
urlInput.style.display = 'none';
|
| 27 |
+
fileInput.style.display = 'none';
|
| 28 |
+
textInput.style.display = 'block';
|
| 29 |
+
document.getElementById('url').value = "";
|
| 30 |
+
document.getElementById('file').value = "";
|
| 31 |
+
} else {
|
| 32 |
+
urlInput.style.display = 'none';
|
| 33 |
+
fileInput.style.display = 'none';
|
| 34 |
+
textInput.style.display = 'none';
|
| 35 |
+
}
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
function clearSummary() {
|
| 39 |
+
document.getElementById('summary').value = '';
|
| 40 |
+
}
|
| 41 |
+
</script>
|
| 42 |
+
</head>
|
| 43 |
+
<body>
|
| 44 |
+
<h1>SummaryMaker</h1>
|
| 45 |
+
<form action="/summarize" method="post" enctype="multipart/form-data" onsubmit="clearSummary()">
|
| 46 |
+
<label for="choice">Choose input text type:</label><br>
|
| 47 |
+
<select id="choice" name="choice" onchange="toggleInput()">
|
| 48 |
+
<option value="">--Select--</option>
|
| 49 |
+
<option value="url">URL</option>
|
| 50 |
+
<option value="file">File</option>
|
| 51 |
+
<option value="text">Text</option>
|
| 52 |
+
</select><br><br>
|
| 53 |
+
|
| 54 |
+
<div id="urlInput" style="display: none;">
|
| 55 |
+
<label for="url">URL to Summarize:</label><br>
|
| 56 |
+
<input type="text" name="url" id="url" value="{{ url }}"><br><br>
|
| 57 |
+
</div>
|
| 58 |
+
|
| 59 |
+
<div id="fileInput" style="display: none;">
|
| 60 |
+
<label for="file">Upload File:</label><br>
|
| 61 |
+
<input type="file" name="file" id="file"><br><br>
|
| 62 |
+
</div>
|
| 63 |
+
|
| 64 |
+
<div id="textInput" style="display: none;">
|
| 65 |
+
<label for="text">Text to Summarize:</label><br>
|
| 66 |
+
<textarea name="text" id="text" rows="10" cols="50">{{ text }}</textarea><br><br>
|
| 67 |
+
</div>
|
| 68 |
+
|
| 69 |
+
<label for="model">Model:</label>
|
| 70 |
+
<input type="text" name="model" id="model" value="{{ model or 't5-base' }}"><br>
|
| 71 |
+
|
| 72 |
+
<label for="max_length">Max Length:</label>
|
| 73 |
+
<input type="number" name="max_length" id="max_length" value="{{ max_length or 180 }}"><br><br>
|
| 74 |
+
|
| 75 |
+
<input type="submit" value="Summarize">
|
| 76 |
+
</form>
|
| 77 |
+
{% if error %}
|
| 78 |
+
<p style="color: red;">{{ error }}</p>
|
| 79 |
+
{% endif %}
|
| 80 |
+
<div>
|
| 81 |
+
<h2>Summary:</h2>
|
| 82 |
+
<textarea id="summary" rows="10" cols="50" readonly>{{ summary }}</textarea>
|
| 83 |
+
</div>
|
| 84 |
+
</body>
|
| 85 |
+
</html>
|
tests/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
#Empty file
|
tests/conftest.py
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pytest
|
| 2 |
+
import tempfile
|
| 3 |
+
import os
|
| 4 |
+
|
| 5 |
+
@pytest.fixture
|
| 6 |
+
def sample_text():
|
| 7 |
+
return """
|
| 8 |
+
Artificial intelligence has emerged as a transformative force in modern healthcare,
|
| 9 |
+
revolutionizing everything from diagnostic procedures to patient care management.
|
| 10 |
+
In recent years, healthcare providers and institutions worldwide have increasingly
|
| 11 |
+
adopted AI-powered solutions to enhance their services and improve patient outcomes.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
@pytest.fixture
|
| 15 |
+
def sample_text_file(sample_text):
|
| 16 |
+
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
|
| 17 |
+
f.write(sample_text)
|
| 18 |
+
yield f.name
|
| 19 |
+
os.unlink(f.name)
|
tests/test_cli.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from click.testing import CliRunner
|
| 2 |
+
#from summarizer.cli import main
|
| 3 |
+
from summarizer.cli import main
|
| 4 |
+
|
| 5 |
+
def test_cli_with_file(sample_text_file, sample_text, mocker):
|
| 6 |
+
# If using: from .summarizer import process_text in cli.py
|
| 7 |
+
mock_process = mocker.patch('summarizer.cli.process_text')
|
| 8 |
+
mock_process.return_value = "Summarized text"
|
| 9 |
+
|
| 10 |
+
runner = CliRunner()
|
| 11 |
+
result = runner.invoke(main, ['--file', sample_text_file])
|
| 12 |
+
|
| 13 |
+
#print("CLI Output:\n", result.output) # Print the output for debugging
|
| 14 |
+
#print("sample text:\n", sample_text)
|
| 15 |
+
|
| 16 |
+
assert result.exit_code == 0
|
| 17 |
+
assert "Summarized text" in result.output
|
| 18 |
+
mock_process.assert_called_once_with(sample_text.strip(), model="t5-base", max_length=180 )
|
| 19 |
+
|
| 20 |
+
def test_cli_with_url(mocker):
|
| 21 |
+
#mock_extract = mocker.patch('summarizer.utils.extract_from_url')
|
| 22 |
+
#mock_process = mocker.patch('summarizer.summarizer.process_text')
|
| 23 |
+
mock_extract = mocker.patch('summarizer.cli.extract_from_url')
|
| 24 |
+
mock_process = mocker.patch('summarizer.cli.process_text')
|
| 25 |
+
|
| 26 |
+
mock_extract.return_value ="""
|
| 27 |
+
This domain is for use in illustrative examples in documents. You may use this
|
| 28 |
+
domain in literature without prior coordination or asking for permission. More information...
|
| 29 |
+
"""
|
| 30 |
+
mock_process.return_value = "Summarized text"
|
| 31 |
+
|
| 32 |
+
runner = CliRunner()
|
| 33 |
+
result = runner.invoke(main, ['--url', 'http://example.com'])
|
| 34 |
+
#result = runner.invoke(main, ['--url', 'https://en.wikipedia.org/wiki/Seoul'])
|
| 35 |
+
|
| 36 |
+
#print("CLI Output:\n", result.output) # Print the output for debugging
|
| 37 |
+
#result.output = """
|
| 38 |
+
#Fetching text from URL: http://example.com
|
| 39 |
+
#Starting summarization process...
|
| 40 |
+
#
|
| 41 |
+
#Summary:
|
| 42 |
+
#================================================================================
|
| 43 |
+
#Summarized text
|
| 44 |
+
#================================================================================
|
| 45 |
+
#"""
|
| 46 |
+
|
| 47 |
+
assert result.exit_code == 0
|
| 48 |
+
assert "Summarized text" in result.output
|
| 49 |
+
|
| 50 |
+
mock_extract.assert_called_once_with('http://example.com')
|
| 51 |
+
#mock_extract.assert_called_once_with('https://en.wikipedia.org/wiki/Seoul')
|
| 52 |
+
#mock_process.assert_called_once_with("Extracted text", model='t5-base', max_length=180)
|
| 53 |
+
mock_process.assert_called_once_with(mock_extract.return_value, model='t5-base', max_length=180)
|
| 54 |
+
|
| 55 |
+
def test_cli_no_input():
|
| 56 |
+
runner = CliRunner()
|
| 57 |
+
result = runner.invoke(main, [])
|
| 58 |
+
|
| 59 |
+
assert result.exit_code != 0
|
| 60 |
+
assert "Please provide either --url or --file" in result.output
|
| 61 |
+
|
| 62 |
+
def test_cli_invalid_file():
|
| 63 |
+
runner = CliRunner()
|
| 64 |
+
result = runner.invoke(main, ['--file', 'nonexistent.txt'])
|
| 65 |
+
|
| 66 |
+
assert result.exit_code != 0
|
| 67 |
+
assert "Error" in result.output
|
tests/test_example.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
def test_example(sample_text_file):
|
| 2 |
+
with open(sample_text_file, 'r') as f:
|
| 3 |
+
content = f.read()
|
| 4 |
+
assert "Artificial intelligence" in content
|
tests/test_summarizer.py
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pytest
|
| 2 |
+
#from'summarizer.summarizer import process_text
|
| 3 |
+
from summarizer.summarizer import process_text
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def test_process_text_success(mocker, sample_text):
|
| 7 |
+
"""
|
| 8 |
+
When you create a pipeline, it's a two-step process:
|
| 9 |
+
|
| 10 |
+
# Step 1: Create the pipeline
|
| 11 |
+
summarizer = pipeline("summarization", model="t5-base")
|
| 12 |
+
# Step 2: Use the pipeline
|
| 13 |
+
summary = summarizer(text)
|
| 14 |
+
|
| 15 |
+
# This works because it matches Step 1 - creating the pipeline
|
| 16 |
+
mock_pipeline.assert_called_once_with("summarization", model="t5-base")
|
| 17 |
+
|
| 18 |
+
# This doesn't work because it's trying to assert Step 2
|
| 19 |
+
mock_pipeline.assert_called_once_with(sample_text, model="t5-base")
|
| 20 |
+
|
| 21 |
+
"""
|
| 22 |
+
# If using: from transformers import pipeline in summarizer.py
|
| 23 |
+
# This works because it matches Step 1 - creating the pipeline
|
| 24 |
+
mock_pipeline = mocker.patch('summarizer.summarizer.pipeline')
|
| 25 |
+
|
| 26 |
+
# If using: import transformers
|
| 27 |
+
#mock_pipeline = mocker.patch('summarizer.summarizer.transformers.pipeline')
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
mock_summarizer = mock_pipeline.return_value
|
| 31 |
+
mock_summarizer.return_value = [{'summary_text': 'Test summary'}]
|
| 32 |
+
#mock_pipeline.return_value.return_value = [{'summary_text': 'Test summary'}]
|
| 33 |
+
|
| 34 |
+
result = process_text(sample_text.strip())
|
| 35 |
+
|
| 36 |
+
#print("result: ", result) #for debugging purpose
|
| 37 |
+
assert result == 'Test summary'
|
| 38 |
+
mock_pipeline.assert_called_once_with("summarization", model="t5-base")
|
| 39 |
+
mock_summarizer.assert_called_once_with(sample_text.strip(), max_length=180)
|
| 40 |
+
#mock_pipeline.assert_called_once_with(sample_text, model="t5-base")
|
| 41 |
+
|
| 42 |
+
def test_process_text_with_custom_model(mocker, sample_text):
|
| 43 |
+
mock_pipeline = mocker.patch('summarizer.summarizer.pipeline')
|
| 44 |
+
mock_summarizer = mock_pipeline.return_value
|
| 45 |
+
mock_summarizer.return_value = [{'summary_text': 'Test summary'}]
|
| 46 |
+
|
| 47 |
+
custom_model = "t5-small"
|
| 48 |
+
result = process_text(sample_text.strip(), model=custom_model)
|
| 49 |
+
|
| 50 |
+
print(result) # print out result for debugging purpose
|
| 51 |
+
|
| 52 |
+
assert result == 'Test summary'
|
| 53 |
+
#mock_pipeline.assert_called_once_with("summarization", model=custom_model)
|
| 54 |
+
mock_summarizer.assert_called_once_with(sample_text.strip(), max_length=180)
|
| 55 |
+
|
| 56 |
+
def test_process_text_failure(mocker, sample_text):
|
| 57 |
+
mock_pipeline = mocker.patch('summarizer.summarizer.pipeline')
|
| 58 |
+
mock_summarizer = mock_pipeline.return_value
|
| 59 |
+
mock_summarizer.return_value = [{'summary_text': 'Test summary'}]
|
| 60 |
+
mock_pipeline.side_effect = Exception("Model error")
|
| 61 |
+
|
| 62 |
+
with pytest.raises(Exception) as exc_info:
|
| 63 |
+
process_text(sample_text.strip())
|
| 64 |
+
|
| 65 |
+
print("Exception String: ", str(exc_info.value)) # for debugging purpose
|
| 66 |
+
assert "Summarization failed" in str(exc_info.value)
|
tests/test_utils.py
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import tempfile
|
| 3 |
+
import pytest
|
| 4 |
+
from summarizer.utils import read_file, extract_from_url
|
| 5 |
+
import requests
|
| 6 |
+
|
| 7 |
+
def test_read_file_success(sample_text_file, sample_text):
|
| 8 |
+
content = read_file(sample_text_file)
|
| 9 |
+
assert content.strip() == sample_text.strip()
|
| 10 |
+
|
| 11 |
+
def test_read_file_nonexistent():
|
| 12 |
+
with pytest.raises(Exception) as exc_info:
|
| 13 |
+
read_file("nonexistent_file.txt")
|
| 14 |
+
assert "File reading failed" in str(exc_info.value)
|
| 15 |
+
|
| 16 |
+
def test_read_file_empty():
|
| 17 |
+
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
|
| 18 |
+
pass
|
| 19 |
+
try:
|
| 20 |
+
with pytest.raises(Exception) as exc_info:
|
| 21 |
+
read_file(f.name)
|
| 22 |
+
assert "File is empty" in str(exc_info.value)
|
| 23 |
+
finally:
|
| 24 |
+
os.unlink(f.name)
|
| 25 |
+
|
| 26 |
+
def test_extract_from_url(requests_mock):
|
| 27 |
+
url = "http://example.com"
|
| 28 |
+
mock_html = """
|
| 29 |
+
<html>
|
| 30 |
+
<body>
|
| 31 |
+
<article>
|
| 32 |
+
<p>First paragraph.</p>
|
| 33 |
+
<p>Second paragraph.</p>
|
| 34 |
+
</article>
|
| 35 |
+
</body>
|
| 36 |
+
</html>
|
| 37 |
+
"""
|
| 38 |
+
requests_mock.get(url, text=mock_html)
|
| 39 |
+
content = extract_from_url(url)
|
| 40 |
+
assert "First paragraph. Second paragraph." in content
|
| 41 |
+
|
| 42 |
+
def test_extract_from_url_no_content(requests_mock):
|
| 43 |
+
url = "http://example.com"
|
| 44 |
+
mock_html = "<html><body></body></html>"
|
| 45 |
+
requests_mock.get(url, text=mock_html)
|
| 46 |
+
with pytest.raises(Exception) as exc_info:
|
| 47 |
+
extract_from_url(url)
|
| 48 |
+
assert "No text content found" in str(exc_info.value)
|
| 49 |
+
|
| 50 |
+
def test_extract_from_url_connection_error(requests_mock):
|
| 51 |
+
url = "http://example.com"
|
| 52 |
+
requests_mock.get(url, exc=requests.exceptions.ConnectionError)
|
| 53 |
+
with pytest.raises(Exception) as exc_info:
|
| 54 |
+
extract_from_url(url)
|
| 55 |
+
assert "Failed to fetch URL" in str(exc_info.value)
|