Spaces:

XPMaster
/

clustering_ed

Build error

App Files Files Community

XPMaster commited on Aug 23, 2023

Commit

4795f03

1 Parent(s): 5d5343f

Update app.py

Browse files

Files changed (1) hide show

app.py +31 -9

app.py CHANGED Viewed

@@ -202,12 +202,12 @@ with tab2:
     ### Principal Component Analysis (PCA)
-    PCA is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional form, while retaining as much of the original variance as possible. It achieves this by identifying the 'directions' (or principal components) that maximize variance.
-    Mathematically, PCA seeks to find the eigenvectors and eigenvalues of the data's covariance matrix. These eigenvectors, ordered by their corresponding eigenvalues, form the new 'axes' of the reduced space.
-    Using PCA for visualization helps in projecting the data onto the first two principal components, making it easier to spot patterns and clusters.
     ### Let's Visualize!
     """)
@@ -224,15 +224,36 @@ with tab2:
         X_transformed = X[:, :2]  # Just use the first two features for visualization
         user_features_transformed = user_features[:2]
-    # Create a DataFrame for easier plotting with plotly
-    df_transformed = pd.DataFrame(X_transformed, columns=['Feature1', 'Feature2'])
     # K-Means Algorithm for Advanced Tab
     kmeans_advanced = KMeans(n_clusters=n_clusters_advanced)
     y_kmeans_advanced = kmeans_advanced.fit_predict(X_transformed)
     df_transformed['cluster'] = y_kmeans_advanced
-    # ... [rest of the visualization code]
     st.plotly_chart(fig_advanced)
@@ -240,7 +261,6 @@ with tab2:
     if st.button('Toggle PCA for Visualization'):
         st.session_state.use_pca = not st.session_state.use_pca
     st.write("""
     ### Interpretation
@@ -250,6 +270,8 @@ with tab2:
     **Feel free to adjust the number of clusters to see how data points get re-grouped!**
     """)
 with about:
     st.title("About")
     st.markdown("""

     ### Principal Component Analysis (PCA)
+    PCA is a dimensionality reduction technique that identifies the axes (principal components) in the dataset that maximize variance. It's like finding the best angle to view data so that differences between data points are most apparent. Mathematically, PCA aims to find orthogonal vectors in the original feature space that capture the most variance in the data.
+    The first principal component captures the most variance, the second principal component (which is orthogonal to the first) captures the second most, and so on.
+    Using PCA can help in visualizing high-dimensional data in a 2D or 3D space, making patterns more discernible.
     ### Let's Visualize!
     """)
         X_transformed = X[:, :2]  # Just use the first two features for visualization
         user_features_transformed = user_features[:2]
     # K-Means Algorithm for Advanced Tab
     kmeans_advanced = KMeans(n_clusters=n_clusters_advanced)
     y_kmeans_advanced = kmeans_advanced.fit_predict(X_transformed)
+    # Create a DataFrame for easier plotting with plotly
+    df_transformed = pd.DataFrame(X_transformed, columns=['Feature1', 'Feature2'])
     df_transformed['cluster'] = y_kmeans_advanced
+    fig_advanced = px.scatter(df_transformed, x='Feature1', y='Feature2', color='cluster',
+                              title='K-Means Clustering for Advanced',
+                              color_continuous_scale=px.colors.qualitative.Set1)
+    # Remove the legend
+    fig_advanced.update_layout(showlegend=False)
+    # Increase the size of the plot
+    fig_advanced.update_layout(width=1200, height=500)
+    fig_advanced.update_coloraxes(showscale=False)
+    # Add user input as a star marker
+    fig_advanced.add_scatter(x=[user_features_transformed[0]], y=[user_features_transformed[1]], mode='markers', marker=dict(symbol='star', size=30, color='white'))
+    # Add centroids with group numbers
+    for i, coord in enumerate(kmeans_advanced.cluster_centers_):
+        fig_advanced.add_annotation(
+            x=coord[0],
+            y=coord[1],
+            text="Group "+str(i+1),
+            showarrow=True,
+            font=dict(color='white', size=25)
+        )
     st.plotly_chart(fig_advanced)
     if st.button('Toggle PCA for Visualization'):
         st.session_state.use_pca = not st.session_state.use_pca
     st.write("""
     ### Interpretation
     **Feel free to adjust the number of clusters to see how data points get re-grouped!**
     """)
 with about:
     st.title("About")
     st.markdown("""