Ramyamaheswari commited on
Commit
2d345bb
Β·
verified Β·
1 Parent(s): d00375b

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +181 -0
app.py CHANGED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ from sklearn.datasets import load_iris
5
+ from sklearn.model_selection import train_test_split
6
+ from sklearn.svm import SVC
7
+ from sklearn.linear_model import LogisticRegression
8
+ from sklearn.preprocessing import StandardScaler
9
+ from sklearn.metrics import classification_report, accuracy_score
10
+ import matplotlib.pyplot as plt
11
+ import seaborn as sns
12
+ from io import StringIO
13
+
14
+ # Page config
15
+ st.set_page_config(page_title="Explore SVM Algorithm", layout="wide")
16
+ st.title("πŸ” Support Vector Machine (SVM) Classifier Explained")
17
+
18
+ # -----------------------------------
19
+ # Theory Section
20
+ # -----------------------------------
21
+ st.markdown("""
22
+ ## πŸ€– What is a Support Vector Machine (SVM)?
23
+
24
+ SVM is a powerful supervised learning algorithm used for classification and regression.
25
+ It works by finding a hyperplane that best separates the classes in the feature space.
26
+
27
+ **Key Ideas:**
28
+ - Maximizes the margin between different classes
29
+ - Effective in high-dimensional spaces
30
+ - Can use **kernel tricks** to handle non-linear classification
31
+
32
+ ---
33
+ ## βš™οΈ How SVM Works
34
+
35
+ 1. Find the optimal hyperplane that separates classes.
36
+ 2. Use **support vectors** β€” data points closest to the hyperplane.
37
+ 3. Maximize the margin between these support vectors.
38
+ 4. Use **kernel functions** to map inputs to higher dimensions if data isn't linearly separable.
39
+
40
+ **Kernel Types:**
41
+ - *Linear*: Straight line separation
42
+ - *RBF (Gaussian)*: Circular, good for complex boundaries
43
+ - *Polynomial*: Curved boundaries
44
+ ---
45
+ """)
46
+
47
+ # -----------------------------------
48
+ # Load Dataset
49
+ # -----------------------------------
50
+ st.subheader("🌼 Try SVM on the Iris Dataset")
51
+ iris = load_iris()
52
+ df = pd.DataFrame(iris.data, columns=iris.feature_names)
53
+ df['target'] = iris.target
54
+ df['species'] = df['target'].apply(lambda x: iris.target_names[x])
55
+ st.dataframe(df.head(), use_container_width=True)
56
+
57
+ # -----------------------------------
58
+ # Model Controls
59
+ # -----------------------------------
60
+ kernel = st.radio("Select SVM Kernel", ["linear", "rbf", "poly"])
61
+ C = st.slider("Select Regularization Parameter (C)", 0.01, 10.0, value=1.0)
62
+
63
+ # Prepare Data
64
+ X = df.drop(columns=["target", "species"])
65
+ y = df['target']
66
+
67
+ scaler = StandardScaler()
68
+ X_scaled = scaler.fit_transform(X)
69
+
70
+ X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
71
+
72
+ # Train SVM
73
+ svm_model = SVC(kernel=kernel, C=C, probability=True, random_state=42)
74
+ svm_model.fit(X_train, y_train)
75
+ svm_pred = svm_model.predict(X_test)
76
+
77
+ svm_acc = accuracy_score(y_test, svm_pred)
78
+ st.success(f"βœ… SVM Accuracy: {svm_acc*100:.2f}%")
79
+
80
+ # -----------------------------------
81
+ # Classification Report
82
+ # -----------------------------------
83
+ svm_report = classification_report(y_test, svm_pred, target_names=iris.target_names)
84
+ st.markdown("### πŸ“Š SVM Classification Report")
85
+ st.text(svm_report)
86
+
87
+ # -----------------------------------
88
+ # Compare with Logistic Regression
89
+ # -----------------------------------
90
+ st.markdown("### πŸ” Compare with Logistic Regression")
91
+
92
+ log_model = LogisticRegression(max_iter=200)
93
+ log_model.fit(X_train, y_train)
94
+ log_pred = log_model.predict(X_test)
95
+ log_acc = accuracy_score(y_test, log_pred)
96
+
97
+ st.info(f"πŸ“ˆ Logistic Regression Accuracy: {log_acc*100:.2f}%")
98
+
99
+ if log_acc > svm_acc:
100
+ st.warning("πŸ€” Logistic Regression outperformed SVM! Try tuning SVM parameters or switching kernels.")
101
+ else:
102
+ st.success("βœ… SVM performed better than Logistic Regression on this dataset.")
103
+
104
+ # -----------------------------------
105
+ # Visualize Decision Boundaries
106
+ # -----------------------------------
107
+ st.markdown("### 🌌 Visualizing Decision Boundaries (2 Features)")
108
+
109
+ feature_x = st.selectbox("Feature for X-axis", df.columns[:-2], index=0)
110
+ feature_y = st.selectbox("Feature for Y-axis", df.columns[:-2], index=1)
111
+
112
+ X_vis = df[[feature_x, feature_y]]
113
+ X_vis_scaled = scaler.fit_transform(X_vis)
114
+ X_train_v, X_test_v, y_train_v, y_test_v = train_test_split(X_vis_scaled, y, test_size=0.2, random_state=42)
115
+
116
+ model_vis = SVC(kernel=kernel, C=C)
117
+ model_vis.fit(X_train_v, y_train_v)
118
+
119
+ h = .02
120
+ x_min, x_max = X_vis_scaled[:, 0].min() - 1, X_vis_scaled[:, 0].max() + 1
121
+ y_min, y_max = X_vis_scaled[:, 1].min() - 1, X_vis_scaled[:, 1].max() + 1
122
+ xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
123
+ Z = model_vis.predict(np.c_[xx.ravel(), yy.ravel()])
124
+ Z = Z.reshape(xx.shape)
125
+
126
+ fig, ax = plt.subplots(figsize=(8, 6))
127
+ plt.contourf(xx, yy, Z, alpha=0.3)
128
+ sns.scatterplot(x=X_vis_scaled[:, 0], y=X_vis_scaled[:, 1], hue=df['species'], palette='deep', ax=ax)
129
+ plt.xlabel(feature_x)
130
+ plt.ylabel(feature_y)
131
+ plt.title("SVM Decision Boundaries")
132
+ st.pyplot(fig)
133
+
134
+ # -----------------------------------
135
+ # Downloadable Report
136
+ # -----------------------------------
137
+ st.markdown("### πŸ“₯ Download SVM Report")
138
+ st.download_button("πŸ“„ Download Classification Report", data=svm_report, file_name="svm_report.txt")
139
+
140
+ # -----------------------------------
141
+ # Summary
142
+ # -----------------------------------
143
+ st.markdown("""
144
+ ---
145
+ ## πŸ’‘ Highlights of SVM:
146
+ - Works well for both linear and non-linear problems.
147
+ - Excellent performance on small to medium-sized datasets.
148
+ - Sensitive to outliers but tunable via regularization.
149
+
150
+ ## πŸ”§ When to Use SVM?
151
+ Use them when:
152
+ - You have a clear margin of separation between classes.
153
+ - You're dealing with high-dimensional data.
154
+ - You want flexibility via kernels.
155
+
156
+ ---
157
+ ### 🧠 Did You Know?
158
+
159
+ - SVMs are **robust to overfitting**, especially in high-dimensional space.
160
+ - The **'C' parameter** controls the trade-off between training error and margin size.
161
+ - The **kernel trick** allows SVMs to operate in infinite-dimensional space.
162
+
163
+ ### πŸ“Œ Pros & Cons
164
+
165
+ | Pros | Cons |
166
+ |---------------------------------|-------------------------------------|
167
+ | Works well on complex boundaries| Slower on large datasets |
168
+ | Effective in high-dimensional space | Needs careful parameter tuning |
169
+ | Can handle non-linear data | Less interpretable than simpler models |
170
+
171
+ ---
172
+ ### πŸŒ€ Kernel Choice Summary
173
+
174
+ | Kernel | Use Case |
175
+ |-------------|---------------------------------|
176
+ | Linear | Simple, linearly separable data |
177
+ | RBF | Most common, good for most cases|
178
+ | Polynomial | Use if you suspect curved boundaries|
179
+
180
+ > 🎯 *Tip:* Start with linear, then try RBF if the data isn't linearly separable.
181
+ """)