ndhanvina commited on
Commit
ce4294b
Β·
verified Β·
1 Parent(s): 3c890f1

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +87 -20
  2. app.py +165 -0
  3. requirements.txt +9 -3
README.md CHANGED
@@ -1,20 +1,87 @@
1
- ---
2
- title: LangChain VideoWeb Summarizer
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Content Summarizer
2
+
3
+ This Streamlit application leverages AI to summarize web articles and YouTube videos. Users can input a URL, and the app will provide a concise summary of the content.
4
+ It summarizes YouTube videos or any website content using **LangChain**, **Google's Gemini API**, and **LangChain's document loaders**.
5
+ It uses `LangChain` and `ChatGoogleGenerativeAI` with `gemini-1.5-flash` model.
6
+
7
+ ## Features
8
+
9
+ - **Web Article Summarization**: Enter the URL of any online article to get a summary.
10
+ - **YouTube Video Summarization**: Provide a YouTube video URL to receive a transcript summary.
11
+ - **Secure API Key Handling**: Uses Streamlit's sidebar for API key input, ensuring it's not exposed in the main interface.
12
+ - **User-Friendly Interface**: Simple and intuitive design for ease of use.
13
+
14
+ ## Technologies Used
15
+
16
+ - **Streamlit**: For creating the web application interface.
17
+ - **Langchain**: To handle the summarization chain and document processing.
18
+ - **Langchain**: To handle the summarization chain and document processing.
19
+ - **[LangChain Google GenAI](https://python.langchain.com/docs/integrations/llms/google_generative_ai)** – for LLM integration.
20
+ - **Validators**: To ensure the validity of input URLs.
21
+ - **youtube_transcript_api**: For fetching transcripts from YouTube videos.
22
+ - **pytube**: To extract video information from YouTube.
23
+ - **Unstructured**: For parsing and extracting content from HTML (web articles).
24
+
25
+ ## Setup and Installation
26
+
27
+ 1. **Clone the repository:**
28
+ ```bash
29
+ git clone https://github.com/your-username/ai-content-summarizer.git
30
+ cd ai-content-summarizer
31
+ ```
32
+
33
+ 2. **Create a virtual environment and activate it:**
34
+ ```bash
35
+ python -m venv venv
36
+ source venv/bin/activate # On Windows use `venv\Scripts\activate`
37
+ ```
38
+
39
+ 3. **Install the required dependencies:**
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+ 4. **Set your Google API Key:**
45
+ - Obtain an API key from [Google AI Studio](https://aistudio.google.com/app/apikey).
46
+ - You can set it as an environment variable:
47
+ ```bash
48
+ export GOOGLE_API_KEY='your_google_api_key_here'
49
+ ```
50
+ - Alternatively, you can enter it directly in the application's sidebar when prompted.
51
+
52
+ ## How to Run
53
+
54
+ 1. **Ensure your virtual environment is activated and dependencies are installed.**
55
+ 2. **Run the Streamlit application:**
56
+ ```bash
57
+ streamlit run app.py
58
+ ```
59
+ 3. Open your web browser and navigate to the local URL provided by Streamlit (usually `http://localhost:8501`).
60
+ 4. Enter your **Google API Key** in the sidebar.
61
+ 5. Paste the URL of a web article or YouTube video into the input field and click "Summarize".
62
+
63
+ ## Example Usage
64
+
65
+ 1. **Find an interesting article online or a YouTube video.**
66
+ 2. **Copy its URL.**
67
+ 3. **Paste the URL into the app's input field.**
68
+ 4. **Click "Summarize" and wait for the AI to generate the summary.**
69
+
70
+ ## Contributing
71
+
72
+ Contributions are welcome! If you have suggestions for improvements or new features, please feel free to:
73
+
74
+ 1. Fork the repository.
75
+ 2. Create a new branch (`git checkout -b feature/your-feature-name`).
76
+ 3. Make your changes.
77
+ 4. Commit your changes (`git commit -m 'Add some feature'`).
78
+ 5. Push to the branch (`git push origin feature/your-feature-name`).
79
+ 6. Open a Pull Request.
80
+
81
+ ## License
82
+
83
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details (if you add one).
84
+
85
+ ## Notes
86
+ - You must have a **valid Google API Key** to use this app.
87
+ - The quality of the summary depends on the clarity and structure of the source content.
app.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from langchain_google_genai import ChatGoogleGenerativeAI
3
+ from langchain.chains.summarize import load_summarize_chain
4
+ from langchain.docstore.document import Document
5
+ from langchain_core.prompts import PromptTemplate
6
+ import validators
7
+ # from youtube_transcript_api import YouTubeTranscriptApi # Replaced by YoutubeLoader
8
+ # from pytube import YouTube # Replaced by YoutubeLoader
9
+ # from unstructured.partition.html import partition_html # Replaced by UnstructuredURLLoader
10
+ # import requests # Replaced by UnstructuredURLLoader
11
+ from langchain_community.document_loaders import YoutubeLoader, UnstructuredURLLoader
12
+ from youtube_transcript_api import TranscriptsDisabled, NoTranscriptFound
13
+
14
+
15
+ # Set page config
16
+ st.set_page_config(page_title="AI Content Summarizer", page_icon="πŸš€", layout="wide")
17
+
18
+ # Custom CSS for styling
19
+ st.markdown("""
20
+ <style>
21
+ .main-header {
22
+ font-size: 36px !important;
23
+ color: #4CAF50;
24
+ text-align: center;
25
+ margin-bottom: 30px;
26
+ }
27
+ .sub-header {
28
+ font-size: 24px !important;
29
+ color: #FF6347;
30
+ margin-top: 20px;
31
+ margin-bottom: 10px;
32
+ }
33
+ .text-input {
34
+ width: 100%;
35
+ padding: 10px;
36
+ border-radius: 5px;
37
+ border: 1px solid #ddd;
38
+ margin-bottom: 20px;
39
+ }
40
+ .submit-button {
41
+ background-color: #4CAF50;
42
+ color: white;
43
+ padding: 10px 20px;
44
+ border: none;
45
+ border-radius: 5px;
46
+ cursor: pointer;
47
+ font-size: 16px;
48
+ }
49
+ .submit-button:hover {
50
+ background-color: #45a049;
51
+ }
52
+ .summary-output {
53
+ background-color: #f9f9f9;
54
+ padding: 20px;
55
+ border-radius: 5px;
56
+ border: 1px solid #eee;
57
+ margin-top: 20px;
58
+ color: #333333; /* Added for text visibility */
59
+ }
60
+ .error-message {
61
+ color: red;
62
+ font-weight: bold;
63
+ }
64
+ </style>
65
+ """, unsafe_allow_html=True)
66
+
67
+ # API Key Input
68
+ st.sidebar.title("API Key Configuration")
69
+ google_api_key = st.sidebar.text_input("πŸ”‘ Google API Key", type="password")
70
+
71
+ # --- Helper Functions ---
72
+ def get_llm(api_key: str):
73
+ """Initializes and returns the ChatGoogleGenerativeAI instance."""
74
+ return ChatGoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=api_key, temperature=0)
75
+
76
+ def load_documents(url: str) -> list[Document]:
77
+ """Loads documents from a URL (YouTube or web article)."""
78
+ docs = []
79
+ try:
80
+ if "youtube.com" in url or "youtu.be" in url:
81
+ st.info("Processing YouTube URL...")
82
+ try:
83
+ loader = YoutubeLoader.from_youtube_url(
84
+ url,
85
+ add_video_info=False, # Keep this as False
86
+ language=['en']
87
+ )
88
+ with st.spinner("Fetching and parsing YouTube content..."):
89
+ docs = loader.load()
90
+ except TranscriptsDisabled:
91
+ st.error(f"Transcripts are disabled for the YouTube video: {url}")
92
+ return []
93
+ except NoTranscriptFound:
94
+ st.error(f"No English transcripts found for the YouTube video: {url}. The video might not have transcripts or not in English.")
95
+ return []
96
+ except Exception as e:
97
+ st.error(f"Error loading YouTube content for {url}: {str(e)}. This could be due to parsing issues or video unavailability.")
98
+ return []
99
+ else:
100
+ st.info("Processing web article URL...")
101
+ headers = {
102
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
103
+ }
104
+ loader = UnstructuredURLLoader(urls=[url], headers=headers)
105
+ with st.spinner("Fetching and parsing web content..."):
106
+ docs = loader.load()
107
+
108
+ if not docs:
109
+ st.warning("No content could be extracted from the URL. For YouTube, check if transcripts are available and in English. For websites, the page might be empty or structured in a way that's hard to parse.")
110
+ return []
111
+ return docs
112
+
113
+ except Exception as e: # This is the outermost catch-all
114
+ st.error(f"An unexpected error occurred during document loading: {str(e)}")
115
+ return []
116
+
117
+
118
+ prompt_template_str = """
119
+ Provide simple understandable summary in around 300 words for the following content:
120
+ Content: {text}
121
+ """
122
+ prompt = PromptTemplate(template=prompt_template_str, input_variables=["text"])
123
+
124
+ def generate_summary(llm, docs: list[Document]):
125
+ """Generates a summary using the LLM and loaded documents."""
126
+ if not docs:
127
+ return "No content to summarize."
128
+ chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
129
+ with st.spinner("AI is summarizing the content..."):
130
+ summary = chain.invoke({"input_documents": docs})
131
+ return summary["output_text"]
132
+
133
+ # --- Main Application ---
134
+ st.markdown("<div class='main-header'>AI Content Summarizer πŸš€</div>", unsafe_allow_html=True)
135
+ st.write("This app summarizes web articles and YouTube videos using AI. Enter a URL below to get started.")
136
+
137
+ # Input URL
138
+ url_input = st.text_input("Enter URL (Article or YouTube):", key="url_input_main", help="Paste the URL of the article or YouTube video you want to summarize.")
139
+
140
+ # Submit button
141
+ if st.button("Summarize Content", key="submit_button_main"):
142
+ if not google_api_key:
143
+ st.error("🚫 Please enter your Google API Key in the sidebar.")
144
+ elif not url_input:
145
+ st.warning("⚠️ Please enter a URL.")
146
+ elif not validators.url(url_input):
147
+ st.error("🚫 Invalid URL. Please enter a valid URL.")
148
+ else:
149
+ try:
150
+ st.markdown("<div class='sub-header'>Processing...</div>", unsafe_allow_html=True)
151
+
152
+ llm = get_llm(api_key=google_api_key)
153
+
154
+ docs = load_documents(url=url_input)
155
+
156
+ if docs:
157
+ summary_result = generate_summary(llm=llm, docs=docs)
158
+ st.markdown("<div class='sub-header'>Summary:</div>", unsafe_allow_html=True)
159
+ st.success("Summary generated successfully!")
160
+ st.markdown(f"<div class='summary-output'>{summary_result}</div>", unsafe_allow_html=True)
161
+ # Error handling for empty docs is done within load_documents
162
+
163
+ except Exception as e:
164
+ st.error(f"An unexpected error occurred: {e}")
165
+ st.markdown(f"<div class='error-message'>Details: {str(e)}</div>", unsafe_allow_html=True)
requirements.txt CHANGED
@@ -1,3 +1,9 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
1
+ streamlit
2
+ langchain
3
+ langchain-google-genai
4
+ validators
5
+ youtube_transcript_api>=0.6.2
6
+ pytube>=15.0.0
7
+ unstructured
8
+ pytest
9
+ pytest-mock