Spaces:

shamim237
/

python-dev-task-skyranko

Runtime error

shamim237 commited on Jan 28, 2023

Commit

f11b302

1 Parent(s): 00d9a12

initial_commit

Files changed (8) hide show

README.md CHANGED Viewed

@@ -1,12 +1,36 @@
----
-title: Python Dev Task Skyranko
-emoji: 👁
-colorFrom: purple
-colorTo: pink
-sdk: streamlit
-sdk_version: 1.17.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# python-dev-task-summarization
+The task has been done in two methods-
+- **using traditional Python libraries (like NLTK,Sumy)**
+- **using pre-trained transformers model**
+# Method-1
+## using traditional Python libraries
+#### Web Scraping Tools:
+- Selenium
+#### Paraphrasing Tools:
+- used [nlpaug](https://github.com/makcedward/nlpaug) library
+#### Summarization Tools:
+- used [sumy](https://miso-belica.github.io/sumy/) library
+#### System Requirements:
+- you will find it in the _requirements.txt_ file
+## How to test or run this?
+- just open this link and follow the instructions: _**https://shamim237-python-dev-task-app-3n18pu.streamlit.app/**_
+# Method-2
+## Using pre-trained transformers model
+#### Web Scraping Tools:
+- ScrapperAPI
+- BeautifulSoup
+#### Paraphrasing Tools:
+- used **"ramsrigouthamg/t5-large-paraphraser-diverse-high-quality"** pre-trained model from HuggingFace
+#### Summarization Tools:
+- used **/"google/pegasus-cnn_dailymail"/** pre-trained model from HuggingFace
+#### System Requirements:
+- you will find it in the _Python_Dev_Task.ipynb_ notebook or in the below link.
+## How to test or run this?
+- Just open the **"Python_Dev_Task.ipynb"** file in Colab _or_ open this link: **_https://colab.research.google.com/drive/1wwaj0TobsnzQL5jMVsYrF5z6rc1944tE?usp=sharing_**
+- Run all the cell
+- The summarization output will show up in the last cell of the notebook.

app.py ADDED Viewed

+import streamlit as st
+from multiapp import MultiApp
+from apps import paraphraseApp, summarizerApp, scraperrApp
+app = MultiApp()
+st.title("Python Dev Task @SkyRanko")
+st.write("==================_Completed by_ **Shamim Mahbub**==================")
+st.markdown("This app provides three services - :red[Scraping], :orange[Paraphrasing] and :blue[Summarizing]")
+st.caption("Note: _After scraping data from Amazon, the data has been paraphrased using a model and then Summarization has been performed on the paraphrased data._")
+# Add all your application here
+app.add_app("Scraper", scraperrApp.app)
+app.add_app("Paraphraser", paraphraseApp.app)
+app.add_app("Summarizer", summarizerApp.app)
+# The main app
+app.run()

multiapp.py ADDED Viewed

+import streamlit as st
+class MultiApp:
+    def __init__(self):
+        self.apps = []
+    def add_app(self, title, func):
+        self.apps.append({
+            "title": title,
+            "function": func
+        })
+    def run(self):
+        app = st.selectbox(
+            'Choose one',
+            self.apps,
+            format_func=lambda app: app['title'])
+        app['function']()

packages.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ firefox-esr

paraphraser.py ADDED Viewed

+import re
+import streamlit as st
+import nlpaug.augmenter.word as naw
+import os
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+@st.cache(allow_output_mutation=True, ttl=48*3600)
+def load_model():
+    aug = naw.ContextualWordEmbsAug(
+        model_path='bert-base-uncased', action="insert")
+    return aug
+aug = load_model()
+def parphrase(passage):
+    sen = []
+    for i in passage:
+        res = len(re.findall(r'\w+', i))
+        if res == 2:
+            pass
+        else:
+            res = i.replace('"', "'").replace("\n", "")
+            sen.append(res)
+    pas = " ".join(sen)
+    para_text = aug.augment(pas)
+    return para_text

requirements.txt ADDED Viewed

+nlpaug==1.1.11
+nltk
+selenium==4.8.0
+sentencepiece==0.1.97
+streamlit==1.17.0
+sumy==0.11.0
+torch==1.13.1
+transformers==4.25.1
+webdriver-manager

scrap.py ADDED Viewed

+import time
+from selenium import webdriver
+from selenium.webdriver import Chrome
+from selenium.webdriver.common.by import By
+from selenium.webdriver.firefox.options import Options
+from selenium.webdriver.firefox.service import Service
+from webdriver_manager.firefox import GeckoDriverManager
+from selenium.webdriver.common.by import By
+def extract(link):
+    url = link
+    firefoxOptions = Options()
+    firefoxOptions.add_argument("--headless")
+    service = Service(GeckoDriverManager().install())
+    driver = webdriver.Firefox(
+        options=firefoxOptions,
+        service=service,
+    )
+    driver.get(url)
+    data = driver.find_element(By.ID,"aplus_feature_div")
+    data = data.text
+    data = data.split("\n")
+    time.sleep(2)
+    return data

summary.py ADDED Viewed

+import nltk
+import streamlit as st
+from sumy.nlp.tokenizers import Tokenizer
+from sumy.parsers.plaintext import PlaintextParser
+from sumy.summarizers.lex_rank import LexRankSummarizer
+@st.cache(allow_output_mutation=True, ttl=48*3600)
+def dwnld_lib():
+    nltk.download('punkt')
+dwnld_lib()
+def text_summary(text):
+    para = " ".join(text)
+    # Create a plaintext parser and tokenizer
+    parser = PlaintextParser.from_string(para, Tokenizer("english"))
+    # Create a LexRank summarizer
+    summarizer = LexRankSummarizer()
+    # Summarize the text and print the results
+    summ = []
+    for sentence in summarizer(parser.document, 4):
+        summy =  str(sentence).capitalize()
+        summ.append(summy)
+    return summ