Gumball2k5 commited on
Commit
47cc2b2
·
verified ·
1 Parent(s): 9e2a7e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -81
README.md CHANGED
@@ -13,92 +13,54 @@ short_description: Predict the weather of Saigon
13
  license: unknown
14
  ---
15
 
16
- 🌦️ Saigon Temperature Forecasting Application
17
- <p align="center"> <a href="https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME" target="_blank"> <img src="https://img.shields.io/badge/Hugging%20Face-Spaces-yellow" alt="Hugging Face Spaces"> </a> <img src="https://img.shields.io/badge/Streamlit-App-orange" alt="Streamlit"> <img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python"> <img src="https://img.shields.io/badge/Models-Stacking%20%26%20LGBM-purple" alt="Models"> </p>
18
 
19
- An interactive web application built with Streamlit to forecast the weather in Ho Chi Minh City (Saigon). This app provides both 5-day (daily) forecasts using a Champion Stacking model and 24-hour (hourly) forecasts using 24 specialized LightGBM models.
20
 
21
- ➡️ View the Live Application Here (Remember to replace this with your actual Hugging Face Space URL)
 
22
 
23
- (Suggestion: Replace this line with a URL to a real screenshot of your app)
24
-
25
- Key Features
26
- The application is organized into four main tabs for a comprehensive user experience:
27
-
28
- 📑 Project Overview & Methodology
29
- Project Overview: Explains the project's goal and the 10-year data source (Visual Crossing).
30
-
31
- "Two-Stream" Strategy: Details the methodology for using two different model types:
32
-
33
- 1. Stacking Model (Daily): For the 5-day forecast, combining the strengths of multiple models.
34
-
35
- 2. Direct Model (Hourly): 24 specialized LGBM models for the 24-hour forecast.
36
-
37
- Model Leaderboard: Displays the top 10 models from our experiments, justifying the "Champion" model selection.
38
-
39
- 🌦️ Live 5-Day Forecast
40
- Date Selector: Allows users to select any date from the test set.
41
-
42
- 5-Day Metrics: Displays predicted vs. actual temperatures (if available) for the next five days.
43
-
44
- Forecast Insights (Why?): Dynamically generated insights based on input features (e.g., "💡 Insight: Yesterday was very hot (30.5°C). The model is using this strong 'persistence' signal...").
45
-
46
- Feature Inspector: A collapsible section to "look under the hood" at the key feature values the model used for its prediction.
47
-
48
- Interactive Visualizations:
49
-
50
- Training Set Overview: A plot of the entire training dataset with an interactive range slider.
51
-
52
- Historical Context: Compares 14 days of actual history against the 5-day forecast.
53
-
54
- Smart Forecast Plot: Intelligently plots the forecast (red line) against the actuals (blue line), hiding the actuals if they are not yet available (for future dates).
55
-
56
- 📊 Model Performance & Diagnostics
57
- Performance Degradation: Line charts showing how model error (RMSE) and accuracy (R²) degrade from Day 1 to Day 5.
58
-
59
- Interactive Slider: A powerful slider (1-5) that dynamically updates the "Forecast vs. Actual" scatter plot to inspect performance for that specific day's model.
60
-
61
- Champion Model Diagnostics: Deep-dive residual plots (Residuals vs. Time, Distribution) to prove model stability and lack of bias.
62
-
63
- ⏱️ Hourly Prediction
64
- Time Selector: Allows users to select a specific Date and Hour to start the 24-hour forecast.
65
-
66
- 24-Hour Metrics: Displays point forecasts (T+2h, T+3h, T+24h) and aggregate values (Average, Max) against their real-time actuals (if available).
67
-
68
- Hourly Historical Context: Plots the past 24 hours of actual data against the next 24 hours of forecasted data.
69
-
70
- Hourly Smart Plot: Compares the 24-hour forecast (red) against the 24-hour actuals (blue), hiding actuals if they are not yet available.
71
-
72
- Model Reliability: An RMSE line plot showing the model's error degradation from T+1h to T+24h.
73
-
74
- 🛠️ Tech Stack
75
- Frontend: Streamlit, Plotly
76
 
77
- Data Science: Pandas, NumPy
 
 
78
 
79
- Machine Learning: Scikit-learn (for Stacking), LightGBM
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
- Model Serving: Joblib
 
 
 
 
 
 
 
 
 
 
82
 
83
- Hosting: Hugging Face Spaces
 
 
84
 
85
- 📂 Project Structure
86
- /
87
- ├── app.py # Main Streamlit application script
88
- ├── requirements.txt # Required Python packages
89
- ├── README.md # This file
90
-
91
- ├── data/
92
- │ ├── final_dataset_tree.csv # Daily features/targets
93
- │ ├── final_hourly_feature_dataset.csv # Hourly features/targets
94
- │ ├── final_5_day_results_df.csv # Daily model performance (RMSE/R2)
95
- │ ├── hourly_120h_evaluation_results.csv # Hourly model performance (RMSE)
96
- │ └── results_df_all_tuned.csv # Model selection leaderboard
97
-
98
- ├── models/
99
- │ ├── champion_stacking_day1.pkl # ... (5 daily models)
100
- │ └── lgbm_model_target_temp_next_1h.pkl # ... (24 hourly models)
101
-
102
- └── src/
103
- ├── benchmark_utils.py # Utility for loading the leaderboard
104
- └── diagnostic_plots.py # Utility for plotting performance graphs
 
13
  license: unknown
14
  ---
15
 
16
+ # 🌦️ Saigon Temperature Forecasting Application
17
+ **An Interactive Dual-Model Forecasting Web App**
18
 
19
+ An interactive web application built with Streamlit to forecast the weather in Ho Chi Minh City (Saigon). This app provides both **5-day (daily) forecasts** using a Champion Stacking model and **24-hour (hourly) forecasts** using 24 specialized LightGBM models.
20
 
21
+ ![ảnh](https*://i.imgur.com/your-screenshot-url.png)
22
+ *(Gợi ý: Thay thế URL này bằng ảnh chụp màn hình ứng dụng của bạn)*
23
 
24
+ ---
25
+ ## 📋 Table of Contents
26
+ - [Project Goal](#-project-goal)
27
+ - [Features](#-features)
28
+ - [Tech Stack](#-tech-stack)
29
+ - [Project Structure](#-project-structure)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ ---
32
+ ## 🎯 Project Goal
33
+ The primary objective of this project is to build an end-to-end machine learning application that forecasts the temperature in Ho Chi Minh City. The project covers the complete ML lifecycle, from data collection and feature engineering to model training, evaluation, and deployment as an interactive web app with two distinct forecasting systems (daily and hourly).
34
 
35
+ ---
36
+ ## ✨ Features
37
+ - **Dual Forecasting Modes**: Provides both 5-day (Daily) and 24-hour (Hourly) predictions.
38
+ - **Interactive Visualizations**:
39
+ - **Historical Context**: Displays actual temperatures for the 14 days (Daily) or 24 hours (Hourly) leading up to the forecast.
40
+ - **Smart Forecast vs. Actual**: Compares the forecast against real temperatures, automatically hiding "Actual" data if it's not yet available (for future dates).
41
+ - **Training Set Overview**: A full plot of the training data with an interactive range slider and fixed Y-axis for easy exploration.
42
+ - **Model Explainability (XAI)**:
43
+ - **Forecast Insights (Why?)**: Dynamically generated insights based on key input features (e.g., "Yesterday was very hot...").
44
+ - **Feature Inspector**: An expandable section detailing the exact feature values the model used for its prediction (e.g., `temp_lag_1`, `humidity`, `day_of_year`).
45
+ - **Performance Dashboard**:
46
+ - **Performance Degradation**: Visualizes how model error (RMSE) and accuracy (R²) degrade over the 5-day or 24-hour horizon.
47
+ - **Interactive Scatter Plot**: A slider allows users to dynamically inspect the "Forecast vs. Actual" performance for any specific horizon (Day 1-5 or Hour 1-24).
48
+ - **Model Diagnostics**: An expandable "Deep Dive" section with residual plots to prove model stability.
49
 
50
+ ---
51
+ ## 🛠️ Tech Stack
52
+ - **Backend & Modeling**:
53
+ - **Python**: Core programming language.
54
+ - **Pandas**: Data manipulation and analysis.
55
+ - **Scikit-learn**: For Stacking Regressor and model evaluation.
56
+ - **LightGBM**: The gradient boosting model used for hourly forecasting.
57
+ - **Joblib**: For model persistence (loading `.pkl` files).
58
+ - **Frontend & Visualization**:
59
+ - **Streamlit**: For building the interactive web application.
60
+ - **Plotly**: For creating interactive charts and visualizations.
61
 
62
+ ---
63
+ ## 📂 Project Structure
64
+ | |-- data/ | |-- final_dataset_tree.csv # Daily features/targets | |-- final_hourly_feature_dataset.csv # Hourly features/targets | |-- final_5_day_results_df.csv # Daily model performance (RMSE/R2) | |-- hourly_120h_evaluation_results.csv # Hourly model performance (RMSE) | |-- results_df_all_tuned.csv # Model selection leaderboard | |-- models/ | |-- champion_stacking_day1.pkl # ... (5 daily models) | |-- lgbm_model_target_temp_next_1h.pkl # ... (24 hourly models) | |-- src/ | |-- benchmark_utils.py # Utility for loading the leaderboard | |-- diagnostic_plots.py # Utility for plotting performance graphs | |-- app.py # The main Streamlit application script |-- requirements.txt # A list of all required Python packages |-- README.md # This file
65
 
66
+ ---