Spaces:

Mpavan45
/

Hotel_Data_Analysis

Sleeping

App Files Files Community

Mpavan45 commited on Jan 9, 2025

Commit

cfe8118

verified ·

1 Parent(s): 628c74b

Update pages/4_EDA( Exploratory Data Analysis).py

Browse files

Files changed (1) hide show

pages/4_EDA( Exploratory Data Analysis).py +56 -0

pages/4_EDA( Exploratory Data Analysis).py CHANGED Viewed

@@ -8,6 +8,62 @@ import sys
 st.markdown("<h1 style='text-align:center; color:blue;'>EDA(Exploratory Data Analysis)</h1>",unsafe_allow_html=True)
 # Access dataset from session state
 data= st.session_state.get("dataset")

 st.markdown("<h1 style='text-align:center; color:blue;'>EDA(Exploratory Data Analysis)</h1>",unsafe_allow_html=True)
+import streamlit as st
+# Title of the Streamlit app
+st.title("Exploratory Data Analysis (EDA) on Agoda Hotel Dataset")
+# Introduction and Aim
+st.header("Aim of the EDA")
+st.write("""
+    The main objective of this EDA is to analyze Agoda's hotel dataset to identify key factors influencing hotel pricing strategies and customer booking preferences.
+    The analysis will focus on uncovering patterns, trends, and relationships in hotel ratings, pricing structures, discounts, and free services.
+    By leveraging these insights, Agoda can optimize its pricing strategy, predict booking preferences, and enhance revenue generation while maintaining customer satisfaction.
+""")
+# Description of the Data
+st.header("Description of the Data")
+st.write("""
+    **Overall Summary:** We are analyzing the Agoda dataset by performing EDA and Statistical Tests on the data that has already been cleaned through data wrangling to address any messiness or missing information.
+    **Table - Agoda_df:** The cleaned dataset consists of over 3,500 hotel listings, which will be used as test subjects for the hotel pricing period.
+    **Dataset Details:**
+    The dataset contains information about 3,219 hotel room listings with 12 features, each detailing aspects of the listing. Below is the description of each column:
+    | Column Name     | Description                                                               |
+    |-----------------|---------------------------------------------------------------------------|
+    | hotel_name      | Name of the hotel.                                                        |
+    | rating          | Average customer rating of the hotel (float, range 1-5).                   |
+    | location        | Address or locality of the hotel.                                          |
+    | review_text     | Customer feedback or comments about the hotel.                             |
+    | reviews         | Total number of customer reviews for the hotel.                            |
+    | cashback        | Cashback amount offered for the booking.                                  |
+    | discount        | Discount percentage applied to the room price.                             |
+    | free_services   | Free services provided (e.g., breakfast, Wi-Fi).                          |
+    | cancellation    | Cancellation policy for the booking (e.g., free, non-refundable).         |
+    | price           | Price of the room after discounts and cashback (float).                    |
+    | state           | The state where the hotel is located.                                      |
+    | category        | Target variable representing the room type or category (e.g., budget, luxury). |
+""")
+# Table-wise EDA & Necessary Tests
+st.header("Table-wise EDA and Necessary Statistical Tests")
+st.write("""
+    **Agoda_df:** Cleaned dataset with hotel details and key features like ratings, price, reviews, cashback, discounts, and free services.
+    The EDA will involve the following steps:
+    - **Summary Statistics:** Analyze the central tendency, spread, and shape of the distribution of each feature.
+    - **Data Distribution:** Visualize the distribution of key features like price, ratings, reviews, cashback, etc.
+    - **Correlation Analysis:** Analyze relationships between numeric features like price, ratings, reviews, cashback, etc.
+    - **Categorical Data Analysis:** Explore categorical variables like hotel category, cancellation policy, state, and location using frequency tables and visualizations.
+    - **Missing Value Analysis:** Ensure no missing values remain, and check the need for imputations.
+    - **Outlier Detection:** Identify any outliers that may skew the analysis or predictions.
+    - **Statistical Tests:** Apply appropriate statistical tests to identify significant differences or relationships (e.g., t-tests for comparing means, chi-squared for categorical variables).
+""")
+# Placeholder for further detailed code or visualizations
+st.write("Further steps will include generating visualizations and statistical tests to explore relationships between features in more detail.")
 # Access dataset from session state
 data= st.session_state.get("dataset")