Mpavan45 commited on
Commit
cfe8118
·
verified ·
1 Parent(s): 628c74b

Update pages/4_EDA( Exploratory Data Analysis).py

Browse files
pages/4_EDA( Exploratory Data Analysis).py CHANGED
@@ -8,6 +8,62 @@ import sys
8
 
9
  st.markdown("<h1 style='text-align:center; color:blue;'>EDA(Exploratory Data Analysis)</h1>",unsafe_allow_html=True)
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # Access dataset from session state
13
  data= st.session_state.get("dataset")
 
8
 
9
  st.markdown("<h1 style='text-align:center; color:blue;'>EDA(Exploratory Data Analysis)</h1>",unsafe_allow_html=True)
10
 
11
+ import streamlit as st
12
+
13
+ # Title of the Streamlit app
14
+ st.title("Exploratory Data Analysis (EDA) on Agoda Hotel Dataset")
15
+
16
+ # Introduction and Aim
17
+ st.header("Aim of the EDA")
18
+ st.write("""
19
+ The main objective of this EDA is to analyze Agoda's hotel dataset to identify key factors influencing hotel pricing strategies and customer booking preferences.
20
+ The analysis will focus on uncovering patterns, trends, and relationships in hotel ratings, pricing structures, discounts, and free services.
21
+ By leveraging these insights, Agoda can optimize its pricing strategy, predict booking preferences, and enhance revenue generation while maintaining customer satisfaction.
22
+ """)
23
+
24
+ # Description of the Data
25
+ st.header("Description of the Data")
26
+ st.write("""
27
+ **Overall Summary:** We are analyzing the Agoda dataset by performing EDA and Statistical Tests on the data that has already been cleaned through data wrangling to address any messiness or missing information.
28
+
29
+ **Table - Agoda_df:** The cleaned dataset consists of over 3,500 hotel listings, which will be used as test subjects for the hotel pricing period.
30
+
31
+ **Dataset Details:**
32
+ The dataset contains information about 3,219 hotel room listings with 12 features, each detailing aspects of the listing. Below is the description of each column:
33
+
34
+ | Column Name | Description |
35
+ |-----------------|---------------------------------------------------------------------------|
36
+ | hotel_name | Name of the hotel. |
37
+ | rating | Average customer rating of the hotel (float, range 1-5). |
38
+ | location | Address or locality of the hotel. |
39
+ | review_text | Customer feedback or comments about the hotel. |
40
+ | reviews | Total number of customer reviews for the hotel. |
41
+ | cashback | Cashback amount offered for the booking. |
42
+ | discount | Discount percentage applied to the room price. |
43
+ | free_services | Free services provided (e.g., breakfast, Wi-Fi). |
44
+ | cancellation | Cancellation policy for the booking (e.g., free, non-refundable). |
45
+ | price | Price of the room after discounts and cashback (float). |
46
+ | state | The state where the hotel is located. |
47
+ | category | Target variable representing the room type or category (e.g., budget, luxury). |
48
+ """)
49
+
50
+ # Table-wise EDA & Necessary Tests
51
+ st.header("Table-wise EDA and Necessary Statistical Tests")
52
+ st.write("""
53
+ **Agoda_df:** Cleaned dataset with hotel details and key features like ratings, price, reviews, cashback, discounts, and free services.
54
+
55
+ The EDA will involve the following steps:
56
+ - **Summary Statistics:** Analyze the central tendency, spread, and shape of the distribution of each feature.
57
+ - **Data Distribution:** Visualize the distribution of key features like price, ratings, reviews, cashback, etc.
58
+ - **Correlation Analysis:** Analyze relationships between numeric features like price, ratings, reviews, cashback, etc.
59
+ - **Categorical Data Analysis:** Explore categorical variables like hotel category, cancellation policy, state, and location using frequency tables and visualizations.
60
+ - **Missing Value Analysis:** Ensure no missing values remain, and check the need for imputations.
61
+ - **Outlier Detection:** Identify any outliers that may skew the analysis or predictions.
62
+ - **Statistical Tests:** Apply appropriate statistical tests to identify significant differences or relationships (e.g., t-tests for comparing means, chi-squared for categorical variables).
63
+ """)
64
+
65
+ # Placeholder for further detailed code or visualizations
66
+ st.write("Further steps will include generating visualizations and statistical tests to explore relationships between features in more detail.")
67
 
68
  # Access dataset from session state
69
  data= st.session_state.get("dataset")