--- title: PCA Variance Puzzle Explorer emoji: 🧩 colorFrom: blue colorTo: yellow sdk: streamlit pinned: false app_file: app.py --- # PCA Variance Puzzle Explorer An interactive educational dashboard designed for students and researchers to intuitively understand **Principal Component Analysis (PCA)** and the mechanics of dimensionality reduction. This application serves as a gamified hands-on exercise following a data science lecture on business analysis and footwear design optimization. ## 🌟 Concept & Learning Objectives Instead of just looking at abstract mathematical formulas, users try to find the "First Principal Component (PC1)" by manually rotating a regression line over a scatter plot of correlated footwear dataset (foot length vs. foot width). Through this hands-on exercise, users can grasp the core concepts of PCA: - **Maximizing Variance:** Discovering that the angle capturing the widest spread of data yields the highest "Explained Variance Ratio". - **Dimensionality Reduction:** Compressing 2D spatial metrics into a single 1D principal score while minimizing information loss. - **Reconstruction Error:** Visually understanding how projecting data points onto the axis relates to minimizing residual errors. ## 🚀 How to Play / Use 1. **Adjust the Angle:** Use the slider in the sidebar to rotate the principal axis line (from -90.0 to 90.0 degrees). 2. **Maximize the Score:** Observe the "Explained Variance Ratio (%)" updating in real-time. Try to find the optimal angle that covers the highest percentage of information. 3. **Visualize Projection:** Toggle the "Show projected data (after compression)" checkbox to see how 2D data points collapse onto the 1D line as compressed data. 4. **Reveal the Answer:** Click the "Show Answer" button to compare your empirical guess with the mathematically calculated exact PCA angle and maximum variance ratio. ## 🛠️ Repository Structure To deploy this successfully on Hugging Face Spaces, ensure your repository contains the following files: - `app.py`: The main Streamlit application script. - `requirements.txt`: Python dependencies (`streamlit`, `numpy`, `pandas`, `matplotlib`, `scikit-learn`). - `README.md`: This configuration and documentation file. - `NotoSansJP-Regular.ttf`: Japanese font file to prevent character rendering warnings on charts. --- Developed for Data Science Education and Computational Analysis Workshops.