Update README.md
Browse files
README.md
CHANGED
|
@@ -14,85 +14,91 @@ license: unknown
|
|
| 14 |
---
|
| 15 |
|
| 16 |
🌦️ Saigon Temperature Forecasting Application
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
📑 Project Overview & Methodology
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
🌦️ Live 5-Day Forecast
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
Forecast Insights (Why?):
|
| 41 |
|
| 42 |
-
Feature Inspector:
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
Training Set Overview:
|
| 47 |
|
| 48 |
-
Historical Context:
|
| 49 |
|
| 50 |
-
Forecast
|
| 51 |
|
| 52 |
-
📊 Model Performance & Diagnostics
|
| 53 |
-
Performance Degradation:
|
| 54 |
|
| 55 |
-
Interactive Slider:
|
| 56 |
|
| 57 |
-
Champion Model Diagnostics:
|
| 58 |
|
| 59 |
-
⏱️ Hourly Prediction
|
| 60 |
-
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
-
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
-
Model Reliability:
|
| 69 |
|
| 70 |
-
🛠️
|
| 71 |
Frontend: Streamlit, Plotly
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
📂
|
| 80 |
/
|
| 81 |
-
├── app.py #
|
| 82 |
-
├── requirements.txt #
|
| 83 |
-
├── README.md #
|
| 84 |
│
|
| 85 |
├── data/
|
| 86 |
-
│ ├── final_dataset_tree.csv #
|
| 87 |
-
│ ├── final_hourly_feature_dataset.csv #
|
| 88 |
-
│ ├── final_5_day_results_df.csv #
|
| 89 |
-
│ ├── hourly_120h_evaluation_results.csv #
|
| 90 |
-
│ └── results_df_all_tuned.csv #
|
| 91 |
│
|
| 92 |
├── models/
|
| 93 |
-
│ ├── champion_stacking_day1.pkl # ... (5
|
| 94 |
-
│ └── lgbm_model_target_temp_next_1h.pkl # ... (24
|
| 95 |
│
|
| 96 |
└── src/
|
| 97 |
-
├── benchmark_utils.py #
|
| 98 |
-
└── diagnostic_plots.py #
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
🌦️ Saigon Temperature Forecasting Application
|
| 17 |
+
<p align="center"> <a href="https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME" target="_blank"> <img src="https://img.shields.io/badge/Hugging%20Face-Spaces-yellow" alt="Hugging Face Spaces"> </a> <img src="https://img.shields.io/badge/Streamlit-App-orange" alt="Streamlit"> <img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python"> <img src="https://img.shields.io/badge/Models-Stacking%20%26%20LGBM-purple" alt="Models"> </p>
|
| 18 |
|
| 19 |
+
An interactive web application built with Streamlit to forecast the weather in Ho Chi Minh City (Saigon). This app provides both 5-day (daily) forecasts using a Champion Stacking model and 24-hour (hourly) forecasts using 24 specialized LightGBM models.
|
| 20 |
|
| 21 |
+
➡️ View the Live Application Here (Remember to replace this with your actual Hugging Face Space URL)
|
| 22 |
+
|
| 23 |
+
(Suggestion: Replace this line with a URL to a real screenshot of your app)
|
| 24 |
+
|
| 25 |
+
✨ Key Features
|
| 26 |
+
The application is organized into four main tabs for a comprehensive user experience:
|
| 27 |
|
| 28 |
📑 Project Overview & Methodology
|
| 29 |
+
Project Overview: Explains the project's goal and the 10-year data source (Visual Crossing).
|
| 30 |
|
| 31 |
+
"Two-Stream" Strategy: Details the methodology for using two different model types:
|
| 32 |
|
| 33 |
+
1. Stacking Model (Daily): For the 5-day forecast, combining the strengths of multiple models.
|
| 34 |
|
| 35 |
+
2. Direct Model (Hourly): 24 specialized LGBM models for the 24-hour forecast.
|
| 36 |
|
| 37 |
+
Model Leaderboard: Displays the top 10 models from our experiments, justifying the "Champion" model selection.
|
| 38 |
|
| 39 |
+
🌦️ Live 5-Day Forecast
|
| 40 |
+
Date Selector: Allows users to select any date from the test set.
|
| 41 |
|
| 42 |
+
5-Day Metrics: Displays predicted vs. actual temperatures (if available) for the next five days.
|
| 43 |
|
| 44 |
+
Forecast Insights (Why?): Dynamically generated insights based on input features (e.g., "💡 Insight: Yesterday was very hot (30.5°C). The model is using this strong 'persistence' signal...").
|
| 45 |
|
| 46 |
+
Feature Inspector: A collapsible section to "look under the hood" at the key feature values the model used for its prediction.
|
| 47 |
|
| 48 |
+
Interactive Visualizations:
|
| 49 |
|
| 50 |
+
Training Set Overview: A plot of the entire training dataset with an interactive range slider.
|
| 51 |
|
| 52 |
+
Historical Context: Compares 14 days of actual history against the 5-day forecast.
|
| 53 |
|
| 54 |
+
Smart Forecast Plot: Intelligently plots the forecast (red line) against the actuals (blue line), hiding the actuals if they are not yet available (for future dates).
|
| 55 |
|
| 56 |
+
📊 Model Performance & Diagnostics
|
| 57 |
+
Performance Degradation: Line charts showing how model error (RMSE) and accuracy (R²) degrade from Day 1 to Day 5.
|
| 58 |
|
| 59 |
+
Interactive Slider: A powerful slider (1-5) that dynamically updates the "Forecast vs. Actual" scatter plot to inspect performance for that specific day's model.
|
| 60 |
|
| 61 |
+
Champion Model Diagnostics: Deep-dive residual plots (Residuals vs. Time, Distribution) to prove model stability and lack of bias.
|
| 62 |
|
| 63 |
+
⏱️ Hourly Prediction
|
| 64 |
+
Time Selector: Allows users to select a specific Date and Hour to start the 24-hour forecast.
|
| 65 |
|
| 66 |
+
24-Hour Metrics: Displays point forecasts (T+2h, T+3h, T+24h) and aggregate values (Average, Max) against their real-time actuals (if available).
|
| 67 |
|
| 68 |
+
Hourly Historical Context: Plots the past 24 hours of actual data against the next 24 hours of forecasted data.
|
| 69 |
|
| 70 |
+
Hourly Smart Plot: Compares the 24-hour forecast (red) against the 24-hour actuals (blue), hiding actuals if they are not yet available.
|
| 71 |
|
| 72 |
+
Model Reliability: An RMSE line plot showing the model's error degradation from T+1h to T+24h.
|
| 73 |
|
| 74 |
+
🛠️ Tech Stack
|
| 75 |
Frontend: Streamlit, Plotly
|
| 76 |
|
| 77 |
+
Data Science: Pandas, NumPy
|
| 78 |
+
|
| 79 |
+
Machine Learning: Scikit-learn (for Stacking), LightGBM
|
| 80 |
|
| 81 |
+
Model Serving: Joblib
|
| 82 |
|
| 83 |
+
Hosting: Hugging Face Spaces
|
| 84 |
|
| 85 |
+
📂 Project Structure
|
| 86 |
/
|
| 87 |
+
├── app.py # Main Streamlit application script
|
| 88 |
+
├── requirements.txt # Required Python packages
|
| 89 |
+
├── README.md # This file
|
| 90 |
│
|
| 91 |
├── data/
|
| 92 |
+
│ ├── final_dataset_tree.csv # Daily features/targets
|
| 93 |
+
│ ├── final_hourly_feature_dataset.csv # Hourly features/targets
|
| 94 |
+
│ ├── final_5_day_results_df.csv # Daily model performance (RMSE/R2)
|
| 95 |
+
│ ├── hourly_120h_evaluation_results.csv # Hourly model performance (RMSE)
|
| 96 |
+
│ └── results_df_all_tuned.csv # Model selection leaderboard
|
| 97 |
│
|
| 98 |
├── models/
|
| 99 |
+
│ ├── champion_stacking_day1.pkl # ... (5 daily models)
|
| 100 |
+
│ └── lgbm_model_target_temp_next_1h.pkl # ... (24 hourly models)
|
| 101 |
│
|
| 102 |
└── src/
|
| 103 |
+
├── benchmark_utils.py # Utility for loading the leaderboard
|
| 104 |
+
└── diagnostic_plots.py # Utility for plotting performance graphs
|