import streamlit as st import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import plotly.express as px from plotly.subplots import make_subplots import plotly.graph_objects as go st.set_page_config( page_title="Air Quality Analysis", page_icon="🌍", layout="wide" # Set layout to wide ) # Add custom CSS for background color page_bg_color = """ """ st.markdown(page_bg_color, unsafe_allow_html=True) # Load your AQI data @st.cache_data def load_data(): # Replace with your data loading logic return pd.read_csv(r"random.csv") # Load data data = load_data() # Dashboard Title st.markdown( """ # **Comprehensive Analysis of Air Quality and Pollutant Dynamics Across Indian Cities** ### **Completed On:** February 2025 ### **Author:** Kaustubh Yewale ## **Introduction** Air pollution is one of the most critical challenges in urban environments, affecting public health, ecosystems, and overall quality of life. This project provides an in-depth analysis of air quality across Indian cities, focusing on pollutants like PM2.5, PM10, CO, SO₂, NO₂, and O₃. Through **advanced visualizations and data-driven insights**, the project uncovers pollution patterns, identifies high-risk areas, and explores the relationship between **weather conditions and pollution levels**. The ultimate goal is to provide **actionable recommendations** for mitigating pollution and supporting **policymakers in making informed decisions**. """, unsafe_allow_html=True ) st.image(r"Picture1.png") st.markdown( """ ## **Problem Statement** Urban air pollution has emerged as a severe environmental and public health issue. Understanding air quality trends and their correlation with weather conditions is crucial for: - Identifying pollution hotspots - Evaluating health risks associated with high pollution levels - Formulating strategic mitigation policies This project **analyzes air quality trends** across Indian cities, using **data science techniques** to extract meaningful insights from pollutant and weather data. """, unsafe_allow_html=True ) st.write(''' ## **Objectives** 1. **Retrieve and preprocess air quality data** from sources like `aqi.in` and **merge with weather data**. 2. **Perform Exploratory Data Analysis (EDA)** to identify pollution trends across different cities and regions. 3. **Analyze the correlation** between AQI and weather conditions (e.g., temperature, humidity, wind speed). 4. **Develop machine learning models** to predict AQI levels based on weather conditions and pollutants. 5. **Provide visual insights** using maps, correlation heatmaps, and distribution plots. 6. **Propose recommendations** for improving air quality and highlight policy implications. ''') st.markdown( """ ## **📊 Data Collection Approach** ##### 1. 🌍 Latitude and Longitude Retrieval: Obtained latitude and longitude of various areas using **ChatGPT** for location-based analysis. ##### 2. 🌫️ Air Quality Data Scraping: Scraped data for key pollutants (**PM2.5, PM10, CO, SO₂, NO₂, O₃**) and AQI from **aqi.in** using Python libraries. ##### 3. ☁️ Weather Data Integration: Retrieved additional weather features (**humidity, pressure, wind speed**) using the **Weatherstack API**. ##### 4. 🔗 Data Consolidation: Merged pollutant data with weather data for each location to create a comprehensive dataset. """, unsafe_allow_html=True ) st.markdown('

AQI Dashboard

', unsafe_allow_html=True) st.markdown("Analyze Air Quality Index data and trends.") # Sidebar Filters st.subheader("Filter Options") cities = st.multiselect("Select Cities", options=data["areas"].unique()) pollutants = st.multiselect("Select Pollutants", options=['pm2.5', 'pm10', 'co', 'so2', 'no2', 'o3']) # Filter Data filtered_data = data.copy() if cities: filtered_data = filtered_data[filtered_data["areas"].isin(cities)] if pollutants: filtered_data = filtered_data[pollutants + ["areas", "AQI"]] # Display Filtered Data st.subheader("Filtered Data") st.dataframe(filtered_data) # Plot AQI Trends # Additional Features st.subheader("Statistics") if not filtered_data.empty: st.write(filtered_data.describe()) else: st.write("No data available for statistics.") st.markdown('

Outlier Detection for Various Pollutants

', unsafe_allow_html=True) # Create a subplot grid fig = make_subplots( rows=3, cols=2, subplot_titles=[ 'Outlier Detection: PM2.5 and PM10', 'Outlier Detection: SO2, NO2, O3', 'Outlier Detection: CO and Total Pollution', 'Outlier Detection: Humidity and Temperature', 'Outlier Detection: CO', 'Outlier Detection: Pressure' ] ) # Add box plots for PM2.5 and PM10 fig.add_trace( go.Box(y=data['pm2.5'], name='PM2.5'), row=1, col=1 ) fig.add_trace( go.Box(y=data['pm10'], name='PM10'), row=1, col=1 ) # Add box plots for SO2, NO2, and O3 fig.add_trace( go.Box(y=data['so2'], name='SO2'), row=1, col=2 ) fig.add_trace( go.Box(y=data['no2'], name='NO2'), row=1, col=2 ) fig.add_trace( go.Box(y=data['o3'], name='O3'), row=1, col=2 ) # Add box plots for CO and Total Pollution fig.add_trace( go.Box(y=data['co'], name='CO'), row=2, col=1 ) fig.add_trace( go.Box(y=data['Total_Pollution'], name='Total Pollution'), row=2, col=1 ) # Add box plots for Humidity and Temperature fig.add_trace( go.Box(y=data['humdity'], name='Humidity'), row=2, col=2 ) fig.add_trace( go.Box(y=data['temperature'], name='Temperature'), row=2, col=2 ) # Add box plots for Pressure and CO fig.add_trace( go.Box(y=data['pressure'], name='Pressure'), row=3, col=1 ) fig.add_trace( go.Box(y=data['co'], name='CO'), row=3, col=2 ) # Update layout fig.update_layout( height=1000, title_text="Outlier Detection for Various Pollutants", showlegend=False ) # Display the boxplot in Streamlit st.plotly_chart(fig, use_container_width=True) st.write('''### Insights and Action-Oriented Insights: 1. **CO exhibits the highest variability** with extreme outliers, reaching levels over **2000**, making it the most critical pollutant to address. 2. **PM10 shows noticeable outliers**, suggesting localized high emissions, likely from construction or dust-related activities. 3. **Other pollutants (PM2.5, SO₂, NO₂, O₃)** have relatively lower levels and variability but should still be monitored to avoid spikes. 4. Focus on reducing **CO emissions** through stricter vehicle and industrial emission regulations. 5. Address **PM10 outliers** by implementing dust control measures at construction sites and promoting green buffers in urban areas. ''') st.markdown('

Air Quality Distribution by Category

', unsafe_allow_html=True) category_counts = data['Air quality'].value_counts().reset_index() category_counts.columns = ['Air quality', 'Count'] # Create the bar chart using Plotly fig1 = px.bar(category_counts, x='Air quality', y='Count', color='Air quality', title='Air Quality Distribution by Category', barmode='stack', text='Count') fig1.update_traces(textposition='outside') # Display the Plotly chart in Streamlit st.plotly_chart(fig1, use_container_width=True) st.write('''### Insights: 1. Most areas fall under **"Moderately Polluted"** and **"Satisfactory"** categories, indicating manageable but concerning pollution levels. 2. Very few areas (40) have **"Good"** air quality, highlighting the need for improvement. 3. **"Poor" and "Very Poor"** areas require urgent intervention to address harmful pollution. --- ### Action-Oriented Insights: 1. Focus on **"Poor" and "Very Poor" regions** with stricter emission controls and localized mitigation strategies. 2. Promote **urban greening** and **renewable energy adoption** to improve air quality across all regions. ''') st.markdown('

Pollution Category Distribution

', unsafe_allow_html=True) # Calculate Pollution Category Counts pollution_counts = data['Pollution_Category'].value_counts().reset_index() pollution_counts.columns = ['Pollution Category', 'Count'] # Define pull values (explode only 'Considerable') pollution_counts['pull'] = pollution_counts['Pollution Category'].apply( lambda x: 0.2 if x == 'Considerable' else 0 ) # Create the pie chart with explosion fig = px.pie( pollution_counts, names='Pollution Category', values='Count', title='Distribution of Pollution Categories', ) fig.update_traces( textinfo='percent+label', pull=pollution_counts['pull'] ) fig.update_layout( title_font_size=24, # Increase title font size width=800, # Set the chart width height=800, # Set the chart height showlegend=True, # Ensure the legend is visible ) # Display the pie chart st.plotly_chart(fig, use_container_width=True) st.write('''### Combined Insights and Action-Oriented Insights: 1. Over **50% of areas** fall into the **"Hazardous"** category, posing severe public health risks. 2. **Unsafe (25.7%)** and **Extremely Hazardous (21.4%)** areas demand immediate interventions to reduce pollutant levels. 3. Implement stricter emission controls and promote **renewable energy adoption** to tackle high pollution areas. 4. Deploy **real-time air monitoring systems** to track pollution trends and take proactive measures. 5. Raise public awareness to encourage sustainable practices like reducing vehicular emissions and waste burning. ''') # Scatter Plot for AQI vs Temperature st.markdown('

Pollution (AQI) vs Temperature

', unsafe_allow_html=True) fig3 = px.scatter( data, x='temperature', y='AQI', color='Air quality', size='pm2.5', hover_name='areas', title='Pollution (AQI) vs Temperature', labels={'temperature': 'Temperature (°C)', 'AQI': 'Air Quality Index'} ) # Display the scatter plot in Streamlit st.plotly_chart(fig3, use_container_width=True) st.write('''### Insights and Action-Oriented Insights: 1. AQI levels are highest between **10°C and 20°C**, with "Moderately Polluted" and "Satisfactory" air quality dominating most temperature ranges. 2. "Very Poor" air quality is primarily observed in regions with temperatures above **15°C**, indicating a potential correlation with warmer conditions. 3. Focus on reducing **PM2.5 emissions**, as larger bubbles highlight its significant contribution to poor AQI levels. 4. Investigate pollutant sources in regions with **10°C–20°C** temperatures and enforce stricter emission controls in warmer areas. ''') st.markdown('

Total Pollution vs Temperature

', unsafe_allow_html=True) # Create the scatter plot using Plotly fig3 = px.scatter( data, x='temperature', y='Total_Pollution', color='Air quality', size='pm2.5', hover_name='areas', title='Total Pollution vs Temperature', labels={'temperature': 'Temperature (°C)', 'Total_Pollution': 'Total Pollution'} ) # Display the scatter plot in Streamlit st.plotly_chart(fig3, use_container_width=True) st.write('''### **Insights and Action-Oriented Recommendations** 1. **Pollution Peaks at Moderate Temperatures**: - Total pollution is highest around 15°C, with "Very Poor" air quality dominating. Target pollution control efforts during these conditions by limiting industrial and vehicular emissions. 2. **Decline in Pollution at Higher Temperatures**: - Total pollution decreases as temperatures exceed 25°C. Encourage maintaining such trends through renewable energy initiatives and green urban planning. 3. **Satisfactory Air Quality Across Wider Ranges**: - Areas with "Satisfactory" air quality are evenly distributed across temperatures. Expand awareness campaigns to sustain and improve such conditions. 4. **Outliers with Extreme Pollution**: - Outlier points with Total Pollution > 2000 demand immediate investigation and action, particularly in "Very Poor" air quality zones. 5. **Seasonal Pollution Management**: - Develop temperature-based pollution mitigation strategies to address seasonal patterns, focusing on cooler months when pollution spikes. ''') st.markdown('

Weather Conditions vs AQI

', unsafe_allow_html=True) # Aggregate AQI by weather condition weather_aqi = data.groupby('Weather_Condition')['AQI'].mean().reset_index() # Create the bar chart using Plotly fig5 = px.bar( weather_aqi, x='Weather_Condition', y='AQI', color='Weather_Condition', text='AQI', title='Weather Conditions vs AQI', color_discrete_sequence=px.colors.qualitative.Pastel ) fig5.update_traces(textposition='outside') # Display the chart in Streamlit st.plotly_chart(fig5, use_container_width=True) st.write('''### Insights and Action-Oriented Insights: 1. **Cool & Dry** and **Normal** weather conditions exhibit the highest AQI levels (**118.36** and **117.23** respectively), indicating significant pollution during these conditions. 2. **Hot & Humid** conditions show relatively lower AQI levels (**80.75**), possibly due to improved pollutant dispersion in warmer, humid climates. 3. Focus on **Cool & Dry** regions for pollution control measures, such as stricter emission regulations and increased monitoring during these conditions. 4. Implement **urban greenery initiatives** and promote **clean energy** solutions to mitigate AQI spikes in areas with "Cool & Dry" and "Normal" conditions. 5. Investigate local sources of pollution in these weather conditions to design targeted mitigation strategies. ''') st.markdown('

Relationship Between Wind Speed and PM2.5 Levels

', unsafe_allow_html=True) # Create scatter plot using Plotly fig_wind_pm25 = px.scatter( data, x='wind_speed', y='pm2.5', color='Air quality', size='pm2.5', title='Relationship Between Wind Speed and PM2.5 Levels', labels={'wind_speed': 'Wind Speed (m/s)', 'pm2.5': 'PM2.5 Levels'}, hover_name='areas' ) fig_wind_pm25.update_layout( xaxis_title='Wind Speed (m/s)', yaxis_title='PM2.5 Levels', legend_title='Air Quality' ) # Display the scatter plot in Streamlit st.plotly_chart(fig_wind_pm25, use_container_width=True) st.write('''### **Insights**: 🌟 Areas with **low wind speed (<10 m/s)** tend to have **higher PM2.5 levels**, indicating insufficient dispersion of particulate matter. 🌟 **Moderately polluted and very poor air quality categories** dominate in regions with low wind speed, emphasizing the impact of stagnant air conditions. ### **Action-Oriented Insights**: ✅ Promote **urban greening and wind corridors** in high-density areas to naturally improve air circulation. ✅ **Install pollution control technologies** near industrial and urban hotspots to mitigate the accumulation of PM2.5. ''') st.markdown('

Relationship Between Humidity and AQI

', unsafe_allow_html=True) # Create the plot fig, ax = plt.subplots(figsize=(10, 6)) sns.regplot( x=data['humdity'], y=data['AQI'], scatter_kws={'alpha': 0.6}, line_kws={'color': 'red'}, ax=ax ) ax.set_title('Relationship Between Humidity and AQI') ax.set_xlabel('Humidity (%)') ax.set_ylabel('Air Quality Index (AQI)') # Display the plot in Streamlit st.pyplot(fig) st.write('''### Insights: 1. The scatterplot shows a **weak positive correlation** between humidity (%) and AQI, with higher humidity levels slightly associated with increased AQI values. 2. The **wide spread** of AQI across all humidity levels suggests that humidity alone is not a strong determinant of air quality. ### Action-Oriented Insights: 1. **Focus on humid regions**: Implement targeted air quality monitoring and mitigation strategies in areas with high humidity and poor air quality. 2. **Address other factors**: Investigate additional variables like wind speed, industrial emissions, and temperature to better understand AQI variability. ''') st.markdown('

Top 10 Most Polluted Cities

', unsafe_allow_html=True) # Get the top 10 most polluted cities top_10_most_polluted = data.nlargest(10, 'AQI') # Create bar chart using Plotly fig6 = px.bar( top_10_most_polluted.sort_values('AQI', ascending=True), y='areas', x='AQI', title='Top 10 Most Polluted Cities', color='AQI', hover_data=['pm2.5', 'pm10', 'co', 'so2', 'no2', 'o3', 'Total_Pollution'], color_continuous_scale=px.colors.sequential.Oryel ) # Display the chart in Streamlit st.plotly_chart(fig6, use_container_width=True) st.write('''### Insights from the Visualization: **Most Polluted Cities**: - Cities like **Serampore**, **Chinsurah**, and **Kalyani** rank highest in AQI, indicating critically poor air quality. - These cities may require immediate interventions due to the significant health risks posed by air pollution. --- ### Recommendations for Action: **For Most Polluted Cities**: - Implement stricter industrial emission controls and traffic regulations. - Promote renewable energy usage and reduce dependency on fossil fuels. ''') st.markdown('

Top 10 Least Polluted Cities

', unsafe_allow_html=True) # Get the top 10 least polluted cities top_10_least_polluted = data.nsmallest(10, 'AQI') # Create bar chart using Plotly fig7 = px.bar( top_10_least_polluted.sort_values('AQI', ascending=True), y='areas', x='AQI', title='Top 10 Least Polluted Cities', color='AQI', hover_data=['pm2.5', 'co', 'pm10', 'Total_Pollution'], color_continuous_scale=px.colors.sequential.Blugrn ) # Display the chart in Streamlit st.plotly_chart(fig7, use_container_width=True) st.write('''### Insights from the Visualization: **Least Polluted Cities**: - Cities such as **Pathanamthitta**, **Adoor**, and **Chengannur** showcase very low AQI values, indicating excellent air quality. - These cities could serve as benchmarks for pollution control measures. --- ### Recommendations for Action: **For Least Polluted Cities**: - Ensure continuous monitoring to maintain the air quality. - Encourage green practices such as afforestation and waste management.''') st.markdown('

Correlation Heatmap of AQI and Pollutants

', unsafe_allow_html=True) # Select relevant data correlation_data = data[['pm2.5', 'pm10', 'co', 'so2', 'no2', 'o3', 'AQI']] # Compute correlation matrix corr_matrix = correlation_data.corr() # Create heatmap using Plotly fig = go.Figure( data=go.Heatmap( z=corr_matrix.values, # Correlation values x=corr_matrix.columns, # Column names y=corr_matrix.columns, # Row names colorscale='Viridis', # Color scheme zmin=-1, # Minimum correlation value zmax=1, # Maximum correlation value texttemplate="%{z:.2f}", # Format correlation values textfont={"size": 12} ) ) # Add title and layout settings fig.update_layout( title="Correlation Heatmap of AQI and Pollutants", xaxis=dict(title="Variables", tickangle=45), yaxis=dict(title="Variables"), autosize=True, template="plotly_white" ) # Display the heatmap in Streamlit st.plotly_chart(fig, use_container_width=True) st.write('''### Insights and Recommendations: 1. **PM2.5 (0.92)** and **PM10 (0.88)** have the strongest correlation with AQI, making them the primary contributors to air quality degradation. 2. **CO (0.78)** also plays a significant role in worsening AQI, often coexisting with particulate matter from combustion sources. 3. **O₃ (-0.11)** shows a weak negative correlation with AQI, suggesting minimal or inverse influence under certain conditions. 4. Focus on reducing **PM2.5**, **PM10**, and **CO** through stricter emission controls and cleaner fuel adoption. 5. Mitigate combined emissions of particulate matter and CO by targeting industrial and vehicular pollution sources.### Insights and Recommendations: 1. **PM2.5 (0.92)** and **PM10 (0.88)** have the strongest correlation with AQI, making them the primary contributors to air quality degradation. 2. **CO (0.78)** also plays a significant role in worsening AQI, often coexisting with particulate matter from combustion sources. 3. **O₃ (-0.11)** shows a weak negative correlation with AQI, suggesting minimal or inverse influence under certain conditions. 4. Focus on reducing **PM2.5**, **PM10**, and **CO** through stricter emission controls and cleaner fuel adoption. 5. Mitigate combined emissions of particulate matter and CO by targeting industrial and vehicular pollution sources.''') st.markdown('

Pollutant Levels Across Pollution Categories

', unsafe_allow_html=True) # Create the bar chart using Plotly fig11 = px.bar( data, x='Pollution_Category', y=['pm2.5', 'pm10', 'so2', 'no2', 'o3', 'co'], title='Pollutant Levels Across Pollution Categories', labels={'value': 'Pollutant Levels (µg/m³)', 'variable': 'Pollutants'}, barmode='group', hover_name='state', color_discrete_sequence=px.colors.qualitative.Vivid_r # Vibrant color palette ) # Enhance hover details fig11.update_traces(hovertemplate='%{x}
%{y:.2f} µg/m³
Pollutant: %{legendgroup}') # Customize layout fig11.update_layout( title_font=dict(size=20, color='Grey', family='Arial Black'), xaxis_title_font=dict(size=16, color='black'), yaxis_title_font=dict(size=16, color='black'), legend_title=dict(font=dict(size=14)), plot_bgcolor='rgba(240, 240, 240, 0.9)' # Light background ) # Display the bar chart in Streamlit st.plotly_chart(fig11, use_container_width=True) st.write('''### **Insights** 1. **Carbon Monoxide Dominance**: - CO is the most significant pollutant across all pollution categories, with exceptionally high levels in the "Extremely Hazardous" and "Hazardous" categories, indicating the substantial contribution of vehicle emissions and industrial processes. 2. **PM2.5 and PM10 Contribution**: - Particulate matter (PM2.5 and PM10) shows a strong presence in "Hazardous" and "Unsafe" categories, reflecting pollution sources like construction, road dust, and combustion activities. 3. **Minor Contribution from Ozone (O₃) and Sulfur Dioxide (SO₂)**: - O₃ and SO₂ have relatively lower levels across all categories, suggesting lesser emissions from natural sources or effective control measures for these pollutants in some regions. 4. **Disproportionate Distribution**: - The pollutant levels significantly vary across pollution categories, with "Extremely Hazardous" and "Hazardous" contributing disproportionately higher levels compared to other categories like "Unsafe" or "Considerable." --- ### **Action-Oriented Recommendations** 1. **Target CO Emissions**: - Implement stricter vehicle emission standards and promote the transition to electric vehicles to reduce CO levels in the most polluted regions. 2. **Control PM2.5 and PM10 Sources**: - Enforce regulations to curb construction dust, road dust, and combustion activities, especially in "Hazardous" and "Unsafe" areas. 3. **Monitor and Mitigate Pollutants**: - Deploy real-time monitoring systems in "Extremely Hazardous" zones to identify and address localized sources of pollution effectively. 4. **Urban Greening Initiatives**: - Increase vegetation and green buffers in urban areas to naturally absorb pollutants like PM2.5 and PM10 and improve air quality. 5. **Public Awareness Campaigns**: - Educate communities on reducing activities that contribute to pollutants, such as waste burning, and encourage the adoption of public transport. ''') st.markdown('

Interactive Map of AQI

', unsafe_allow_html=True) # Create the interactive scatter geo map fig9 = px.scatter_geo( data, lat='lat', lon='long', color='AQI', size='AQI', title='Interactive Map of AQI', hover_name='areas', hover_data=['wind_speed'], color_continuous_scale='RdYlGn_r' # Red-Yellow-Green scale (reversed) ) # Update layout for better visualization fig9.update_layout( geo=dict( showland=True, landcolor="rgb(217, 217, 217)", showcoastlines=True, coastlinecolor="rgb(0, 0, 0)", ), margin=dict(l=10, r=10, t=50, b=10) ) # Display the map in Streamlit st.plotly_chart(fig9, use_container_width=True) st.write(':red[Its an interactive map you can zoom it and hover it to get more info.]') st.write('''### Insights and Action-Oriented Insights: 1. **Northern and eastern regions** exhibit the highest AQI levels, with hotspots visible, likely due to industrial activities and high population density. 2. **Central and southern regions** show relatively moderate AQI levels, indicating better air quality and manageable pollution levels. 3. Urban clusters with severe AQI levels require **localized pollution control measures**, including traffic management and stricter industrial regulations. 4. Deploy additional **air quality monitoring stations** in less covered areas to ensure comprehensive data collection for proactive interventions. 5. Focus on **reducing vehicular emissions** and promoting **green energy solutions** in high-AQI regions to improve air quality. ''') st.markdown('

Conclusion

', unsafe_allow_html=True) st.markdown(''' The AQI EDA project has provided a **comprehensive analysis of air quality** across different regions. Key insights include: 1. **Air Quality Trends**: - The project identified areas with critical pollution levels and categorized them into distinct AQI categories (*Good, Satisfactory, Moderate, Poor, Very Poor, Severe*). - This helps pinpoint regions requiring urgent intervention. 2. **Geographic Visualization**: - Interactive visualizations, such as scatter plots on satellite maps, offered an intuitive way to understand the **spatial distribution of pollution levels**, highlighting hotspots and relatively cleaner areas. 3. **Correlations and Patterns**: - Analysis of parameters like **PM2.5, PM10, CO, SO₂, NO₂, and O₃** provided valuable insights into their contributions to overall pollution. - Seasonal variations and weather conditions like humidity and wind speed were found to influence AQI. 4. **Health Impacts**: - The classification of AQI into categories (*e.g., Hazardous, Unsafe*) serves as a tool for raising **public awareness** about the health risks associated with air pollution. ''')