Spaces:

XPMaster
/

clustering_ed

Build error

App Files Files Community

XPMaster commited on Aug 23, 2023

Commit

5d5343f

1 Parent(s): 7888962

Update app.py

Browse files

Files changed (1) hide show

app.py +32 -46

app.py CHANGED Viewed

@@ -196,64 +196,50 @@ with tab2:
     & x \text{ is a data point in cluster } C_i.
     \end{align*}
     ''')
     st.write("""
     The K-Means algorithm tries to find the best centroids such that the \( \mathrm{WCSS} \) is minimized.
-    ### Let's Visualize!
-    Here, we've plotted the iris dataset using the first two features. You can adjust the number of clusters using the sidebar.
     """)
-    # Sidebar for Advanced
-    # st.sidebar.header('K-Means Parameters')
-    # n_clusters_advanced = st.sidebar.slider('Number of Clusters (K)', 1, 10, 3)
     # K-Means Algorithm for Advanced Tab
     kmeans_advanced = KMeans(n_clusters=n_clusters_advanced)
-    y_kmeans_advanced = kmeans_advanced.fit_predict(X)
-    # Create a DataFrame for easier plotting with plotly
-    df_advanced = pd.DataFrame(X, columns=iris.feature_names)
-    df_advanced['cluster'] = y_kmeans_advanced
-    fig_advanced = px.scatter(df_advanced, x=df_advanced.columns[0], y=df_advanced.columns[1], color='cluster',
-                              title='K-Means Clustering for Advanced',
-                              labels={df_advanced.columns[0]: 'Feature 1', df_advanced.columns[1]: 'Feature 2'},
-                              color_continuous_scale=px.colors.qualitative.Set1)
-    # Remove the legend
-    fig_advanced.update_layout(showlegend=False)
-    # Increase the size of the plot
-    fig_advanced.update_layout(width=1200, height=500)
-    fig_advanced.update_coloraxes(showscale=False)
-    # Add user input as a star marker
-    fig_advanced.add_scatter(x=[user_features[0]], y=[user_features[1]], mode='markers', marker=dict(symbol='star', size=30, color='white'))
-    # Add annotation for user input
-    fig_advanced.add_annotation(
-        x=user_features[0],
-        y=user_features[1],
-        xshift=10,
-        text="Your Flower",
-        font=dict(color='white', size=30),
-        arrowhead=2,
-        ax=10,
-        ay=-40
-    )
-    # Add centroids with group numbers
-    for i, coord in enumerate(kmeans.cluster_centers_):
-        fig_advanced.add_annotation(
-            x=coord[0],
-            y=coord[1],
-            text="Group "+str(i+1),
-            showarrow=True,
-            font=dict(color='white', size=25)
-        )
-    st.plotly_chart(fig_advanced)
     st.write("""
     ### Interpretation

     & x \text{ is a data point in cluster } C_i.
     \end{align*}
     ''')
     st.write("""
     The K-Means algorithm tries to find the best centroids such that the \( \mathrm{WCSS} \) is minimized.
+    ### Principal Component Analysis (PCA)
+    PCA is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional form, while retaining as much of the original variance as possible. It achieves this by identifying the 'directions' (or principal components) that maximize variance.
+    Mathematically, PCA seeks to find the eigenvectors and eigenvalues of the data's covariance matrix. These eigenvectors, ordered by their corresponding eigenvalues, form the new 'axes' of the reduced space.
+    Using PCA for visualization helps in projecting the data onto the first two principal components, making it easier to spot patterns and clusters.
+    ### Let's Visualize!
     """)
+    # Check if 'use_pca' is already in the session state
+    if 'use_pca' not in st.session_state:
+        st.session_state.use_pca = True
+    if st.session_state.use_pca:
+        # Apply PCA for dimensionality reduction
+        pca = PCA(n_components=2)
+        X_transformed = pca.fit_transform(X)
+        user_features_transformed = pca.transform([user_features])[0]
+    else:
+        X_transformed = X[:, :2]  # Just use the first two features for visualization
+        user_features_transformed = user_features[:2]
+    # Create a DataFrame for easier plotting with plotly
+    df_transformed = pd.DataFrame(X_transformed, columns=['Feature1', 'Feature2'])
     # K-Means Algorithm for Advanced Tab
     kmeans_advanced = KMeans(n_clusters=n_clusters_advanced)
+    y_kmeans_advanced = kmeans_advanced.fit_predict(X_transformed)
+    df_transformed['cluster'] = y_kmeans_advanced
+    # ... [rest of the visualization code]
+    st.plotly_chart(fig_advanced)
+    # Button to toggle PCA
+    if st.button('Toggle PCA for Visualization'):
+        st.session_state.use_pca = not st.session_state.use_pca
     st.write("""
     ### Interpretation