tjl8 commited on
Commit
29a3914
·
verified ·
1 Parent(s): 1177439

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +6 -6
app.py CHANGED
@@ -213,24 +213,24 @@ if st.checkbox("Show Raw Data"):
213
 
214
  # Summary
215
  st.markdown('<div class="plot-container">', unsafe_allow_html=True)
216
- st.markdown("### Project Summary")
217
  #st.markdown("This project used dataset from the r/AskReddit subreddit. It includes various fields such as post scores, the number of comments, upvote ratios, and timestamps, providing a comprehensive view of post performance and user engagement. The dataset was explored through several interactive visualizations designed to reveal key insights effectively. A line graph was used to display the number of posts by the hour for each day of the week, with a dropdown menu allowing users to select specific days, making it easy to identify and compare posting patterns such as peak activity hours. A scatter plot illustrated the correlation between post scores and other engagement metrics like the number of comments and crossposts, with interactive radio buttons enabling users to switch the x-axis variable and examine relationships influencing post success. A word cloud incorporating sentiment analysis enriched the exploration by categorizing post titles as positive, negative, or neutral using VADER sentiment analysis. Words were color-coded based on their sentiment, and users could toggle between categories and explore different color palettes, making the visualization both informative and visually appealing. Lastly, a bubble chart provided a unique perspective on individual author performance, with bubbles representing authors, their size reflecting the upvote ratio, and their position showing the number of comments and post scores, offering an engaging way to analyze user contributions.")
218
  st.markdown("## Project Details")
219
  st.markdown("""
220
- ### Data Source
221
  The data for this project was collected using the Reddit API through the `praw` library. The subreddit **r/AskReddit** was selected as it is one of the most popular discussion forums on Reddit. Posts were fetched from the 'hot' section, which highlights trending discussions. Key fields such as post titles, scores, number of comments, upvote ratios, and timestamps were extracted for analysis.
222
 
223
- ### Data Preprocessing and Handling Missing Values
224
  - **Data Cleaning**: Since the data was directly fetched from the Reddit API, minimal preprocessing was needed. Missing or null values were not explicitly handled in this project as they did not significantly impact the visualizations or analysis.
225
  - **Sentiment Analysis**: Titles were analyzed using the VADER Sentiment Analyzer from the `nltk` library. Each post was categorized as positive, negative, or neutral based on its compound sentiment score.
226
  - **Time Conversion**: Timestamps were converted to human-readable formats to extract features like the day of the week and hour of the day.
227
 
228
- ### Challenges Faced
229
  - **API Rate Limits**: Fetching a large volume of posts was restricted by Reddit's API rate limits, which required careful management of API requests.
230
  - **Sentiment Analysis**: Interpreting short or ambiguous post titles posed challenges for sentiment classification, leading to potential misclassification of sentiments.
231
  - **Interactive Elements**: Designing an intuitive and responsive interface with multiple interactivity options required iterative testing and debugging.
232
 
233
- ### Visualizations
234
  1. **Number of Posts vs. Time of Day**:
235
  - A line chart displays the number of posts for a selected day of the week, allowing users to identify peak activity hours.
236
  - **Interactive Feature**: Users can select the day of the week using a dropdown.
@@ -247,7 +247,7 @@ The data for this project was collected using the Reddit API through the `praw`
247
  - A bubble chart represents individual authors' contributions, where the bubble size reflects the upvote ratio, and the position shows the number of comments and scores.
248
  - **Interactive Feature**: Users can hover over bubbles to view detailed tooltips about the author.
249
 
250
- ### Summary
251
  This project provides an interactive dashboard for exploring trends and insights from the r/AskReddit subreddit. Through a combination of sentiment analysis, temporal patterns, and user engagement metrics, the visualizations aim to uncover key factors that contribute to the popularity and engagement of posts.
252
  """)
253
 
 
213
 
214
  # Summary
215
  st.markdown('<div class="plot-container">', unsafe_allow_html=True)
216
+ #st.markdown("### Project Summary")
217
  #st.markdown("This project used dataset from the r/AskReddit subreddit. It includes various fields such as post scores, the number of comments, upvote ratios, and timestamps, providing a comprehensive view of post performance and user engagement. The dataset was explored through several interactive visualizations designed to reveal key insights effectively. A line graph was used to display the number of posts by the hour for each day of the week, with a dropdown menu allowing users to select specific days, making it easy to identify and compare posting patterns such as peak activity hours. A scatter plot illustrated the correlation between post scores and other engagement metrics like the number of comments and crossposts, with interactive radio buttons enabling users to switch the x-axis variable and examine relationships influencing post success. A word cloud incorporating sentiment analysis enriched the exploration by categorizing post titles as positive, negative, or neutral using VADER sentiment analysis. Words were color-coded based on their sentiment, and users could toggle between categories and explore different color palettes, making the visualization both informative and visually appealing. Lastly, a bubble chart provided a unique perspective on individual author performance, with bubbles representing authors, their size reflecting the upvote ratio, and their position showing the number of comments and post scores, offering an engaging way to analyze user contributions.")
218
  st.markdown("## Project Details")
219
  st.markdown("""
220
+ 1) Data Source
221
  The data for this project was collected using the Reddit API through the `praw` library. The subreddit **r/AskReddit** was selected as it is one of the most popular discussion forums on Reddit. Posts were fetched from the 'hot' section, which highlights trending discussions. Key fields such as post titles, scores, number of comments, upvote ratios, and timestamps were extracted for analysis.
222
 
223
+ 2) Data Preprocessing and Handling Missing Values
224
  - **Data Cleaning**: Since the data was directly fetched from the Reddit API, minimal preprocessing was needed. Missing or null values were not explicitly handled in this project as they did not significantly impact the visualizations or analysis.
225
  - **Sentiment Analysis**: Titles were analyzed using the VADER Sentiment Analyzer from the `nltk` library. Each post was categorized as positive, negative, or neutral based on its compound sentiment score.
226
  - **Time Conversion**: Timestamps were converted to human-readable formats to extract features like the day of the week and hour of the day.
227
 
228
+ 3) Challenges Faced
229
  - **API Rate Limits**: Fetching a large volume of posts was restricted by Reddit's API rate limits, which required careful management of API requests.
230
  - **Sentiment Analysis**: Interpreting short or ambiguous post titles posed challenges for sentiment classification, leading to potential misclassification of sentiments.
231
  - **Interactive Elements**: Designing an intuitive and responsive interface with multiple interactivity options required iterative testing and debugging.
232
 
233
+ 4) Visualizations
234
  1. **Number of Posts vs. Time of Day**:
235
  - A line chart displays the number of posts for a selected day of the week, allowing users to identify peak activity hours.
236
  - **Interactive Feature**: Users can select the day of the week using a dropdown.
 
247
  - A bubble chart represents individual authors' contributions, where the bubble size reflects the upvote ratio, and the position shows the number of comments and scores.
248
  - **Interactive Feature**: Users can hover over bubbles to view detailed tooltips about the author.
249
 
250
+ 5) Summary of the project
251
  This project provides an interactive dashboard for exploring trends and insights from the r/AskReddit subreddit. Through a combination of sentiment analysis, temporal patterns, and user engagement metrics, the visualizations aim to uncover key factors that contribute to the popularity and engagement of posts.
252
  """)
253