sree4411 commited on
Commit
bc67bfd
Β·
verified Β·
1 Parent(s): 0ff8008

Update pages/Linear Regression.py

Browse files
Files changed (1) hide show
  1. pages/Linear Regression.py +88 -128
pages/Linear Regression.py CHANGED
@@ -1,142 +1,102 @@
1
  import streamlit as st
2
- import pandas as pd
3
- import numpy as np
4
- from sklearn.model_selection import train_test_split
5
- from sklearn.linear_model import LinearRegression
6
- from sklearn.metrics import mean_squared_error, r2_score
7
- import matplotlib.pyplot as plt
8
- import seaborn as sns
9
 
10
- st.set_page_config(page_title="Explore Linear Regression", layout="wide")
11
- st.title("πŸ“ˆ Linear Regression Explained")
12
 
13
- # Tabs
 
14
 
15
- with st.sidebar:
16
- st.header("πŸ“Š Data Options")
17
- uploaded_file = st.file_uploader("Upload your CSV file", type=["csv"])
18
-
19
- if uploaded_file is None:
20
- st.warning("Using default dataset (Boston Housing dataset replacement). Upload your own for custom results.")
21
 
22
- if uploaded_file:
23
- df = pd.read_csv(uploaded_file)
24
- else:
25
- from sklearn.datasets import fetch_california_housing
26
- data = fetch_california_housing()
27
- df = pd.DataFrame(data.data, columns=data.feature_names)
28
- df['target'] = data.target
29
-
30
- # Tabs
31
-
32
- tab1, tab2, tab3 = st.tabs(["πŸ“– About Linear Regression", "βš™οΈ Train Model", "πŸ“ˆ Visualize"])
33
-
34
- with tab1:
35
-
36
-
37
- st.title("πŸ“ˆ Linear Regression - Intuition & Explanation")
38
-
39
- st.markdown("""
40
- Linear Regression is a **supervised machine learning algorithm** used to predict a continuous target variable based on one or more input features.
41
-
42
- It tries to **fit a straight line** (or hyperplane) through the data that minimizes the error between actual and predicted values.
43
- """)
44
-
45
- st.subheader("πŸ”Ή Simple Linear Regression Formula")
46
-
47
- st.latex(r'''
48
- y = \beta_0 + \beta_1 x + \epsilon
49
- ''')
50
-
51
- st.markdown("""
52
- Where:
53
- - \( y \): Predicted value
54
- - \( x \): Input feature
55
- - \( \beta_0 \): Intercept
56
- - \( \beta_1 \): Slope of the line
57
- - \( \epsilon \): Error term
58
- """)
59
-
60
- st.subheader("πŸ”Ή Multiple Linear Regression")
61
-
62
- st.latex(r'''
63
- y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon
64
- ''')
65
-
66
  st.markdown("""
67
- This is used when we have more than one independent variable.
68
  """)
69
-
70
- st.subheader("🎯 Objective of Linear Regression")
71
- st.markdown("To find the best-fit line by minimizing the **sum of squared errors (SSE)**.")
72
-
73
- st.latex(r'''
74
- SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
75
- ''')
76
-
77
- st.subheader("πŸ“˜ Cost Function (Mean Squared Error)")
78
-
79
- st.latex(r'''
80
- J(\beta) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
81
- ''')
82
-
83
  st.markdown("""
84
- - The algorithm tries to find values of \( \beta \) (coefficients) that **minimize this cost function**.
 
 
 
85
  """)
86
-
87
- st.subheader("πŸ“Œ Assumptions of Linear Regression")
88
  st.markdown("""
89
- - **Linearity**: Relationship between input and output is linear
90
- - **Independence**: Observations are independent
91
- - **Homoscedasticity**: Constant variance of errors
92
- - **Normality of errors**
93
- - **No multicollinearity** (for multiple regression)
94
  """)
95
-
96
- st.subheader("πŸ’‘ When to Use Linear Regression?")
 
97
  st.markdown("""
98
- - To predict continuous numeric values (e.g., price, salary, marks)
99
- - To analyze how inputs are related to output
100
- - Easy to implement and interpret
 
101
  """)
102
 
103
-
104
- with tab2:
105
- st.subheader("βš™οΈ Train Linear Regression Model")
106
-
107
- target_col = st.selectbox("Select Target Variable", df.columns)
108
- feature_cols = st.multiselect("Select Feature Columns", [col for col in df.columns if col != target_col])
109
-
110
- if feature_cols and target_col:
111
- X = df[feature_cols]
112
- y = df[target_col]
113
-
114
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
115
-
116
- model = LinearRegression()
117
- model.fit(X_train, y_train)
118
- y_pred = model.predict(X_test)
119
-
120
- st.success(f"Model Trained Successfully! βœ…")
121
- st.metric("RΒ² Score", f"{r2_score(y_test, y_pred):.4f}")
122
- st.metric("MSE", f"{mean_squared_error(y_test, y_pred):.4f}")
123
-
124
- st.markdown("### Coefficients")
125
- coef_df = pd.DataFrame({"Feature": feature_cols, "Coefficient": model.coef_})
126
- st.dataframe(coef_df)
127
-
128
- with tab3:
129
- st.subheader("πŸ“ˆ Actual vs Predicted Plot")
130
-
131
- if feature_cols and target_col:
132
- fig, ax = plt.subplots()
133
- sns.scatterplot(x=y_test, y=y_pred, ax=ax)
134
- ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
135
- ax.set_xlabel("Actual")
136
- ax.set_ylabel("Predicted")
137
- ax.set_title("Actual vs Predicted")
138
- st.pyplot(fig)
139
-
140
- st.markdown("---")
141
- st.markdown("### πŸ’‘ Tip:")
142
- st.info("If predictions look scattered from the red line, try using non-linear models or transform your features.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import streamlit as st
 
 
 
 
 
 
 
2
 
3
+ st.set_page_config(page_title="Linear Regression", page_icon="πŸ“ˆ", layout="wide")
 
4
 
5
+ # Page Title
6
+ st.markdown("<h1>πŸ“ˆ Linear Regression</h1>", unsafe_allow_html=True)
7
 
8
+ # Introduction
9
+ st.markdown("### 🧠 What is Linear Regression?")
10
+ st.markdown("""
11
+ Linear Regression is a **supervised learning** algorithm used for predicting a **continuous output**.
12
+ It models the relationship between one or more **independent variables (features)** and a **dependent variable (target)** by fitting a linear equation to the data.
13
+ """)
14
 
15
+ # How It Works
16
+ st.markdown("### βš™οΈ How Linear Regression Works")
17
+ with st.expander("Mathematical Model"):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  st.markdown("""
19
+ Linear regression fits a line defined by the equation:
20
  """)
21
+ st.latex(r"y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n + \epsilon")
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  st.markdown("""
23
+ - $\\beta_0$ is the **intercept**
24
+ - $\\beta_1, ..., \\beta_n$ are the **coefficients (weights)**
25
+ - $x_1, ..., x_n$ are the **feature values**
26
+ - $\\epsilon$ is the **error term**
27
  """)
28
+
29
+ with st.expander("Goal of the Algorithm"):
30
  st.markdown("""
31
+ Minimize the **residual sum of squares** between actual and predicted values.
 
 
 
 
32
  """)
33
+ st.latex(r"RSS = \sum_{i=1}^n (y_i - \hat{y}_i)^2")
34
+
35
+ with st.expander("Training Process"):
36
  st.markdown("""
37
+ - Find optimal weights (coefficients) using:
38
+ - **Ordinary Least Squares (OLS)**
39
+ - **Gradient Descent**
40
+ - OLS minimizes the squared differences between actual and predicted values.
41
  """)
42
 
43
+ # Types
44
+ st.markdown("### πŸ“š Types of Linear Regression")
45
+ st.markdown("""
46
+ - **Simple Linear Regression**: One independent variable
47
+ - **Multiple Linear Regression**: More than one independent variable
48
+ - **Polynomial Regression**: Non-linear relationship modeled with polynomial terms
49
+ - **Ridge, Lasso Regression**: Regularized versions to prevent overfitting
50
+ """)
51
+
52
+ # Assumptions
53
+ st.markdown("### βœ… Assumptions of Linear Regression")
54
+ st.markdown("""
55
+ - **Linearity**: Relationship between input and output is linear
56
+ - **Independence**: Observations are independent
57
+ - **Homoscedasticity**: Constant variance of residuals
58
+ - **Normality**: Residuals are normally distributed
59
+ - **No multicollinearity**: Independent variables aren't too correlated
60
+ """)
61
+
62
+ # Metrics
63
+ st.markdown("### πŸ“ Evaluation Metrics")
64
+ st.markdown("""
65
+ - **Mean Absolute Error (MAE)**
66
+ - **Mean Squared Error (MSE)**
67
+ - **Root Mean Squared Error (RMSE)**
68
+ - **RΒ² Score (Coefficient of Determination)**
69
+ """)
70
+
71
+ # Visualization and Prediction Explanation
72
+ st.markdown("### πŸ“‰ Visual Representation")
73
+ st.markdown("""
74
+ Linear regression fits a **line** (or hyperplane) to the data such that the **sum of squared errors** between actual and predicted values is minimized.
75
+ """)
76
+
77
+ # When to Use
78
+ st.markdown("### 🎯 When to Use Linear Regression")
79
+ st.markdown("""
80
+ - When the relationship between input and output is **approximately linear**
81
+ - When you want **interpretability** (coefficients show effect size)
82
+ - When **speed** is important and data is not too large
83
+ """)
84
+
85
+ # Regularization
86
+ st.markdown("### πŸ”’ Regularization Techniques")
87
+ st.markdown("""
88
+ To prevent overfitting, especially in multiple linear regression:
89
+ - **Ridge Regression** (L2 penalty): Shrinks coefficients
90
+ - **Lasso Regression** (L1 penalty): Can set some coefficients to 0
91
+ - **Elastic Net**: Combines Ridge and Lasso
92
+ """)
93
+
94
+ # Final Note
95
+ st.markdown("### βœ… Summary")
96
+ st.markdown("""
97
+ Linear Regression is:
98
+ - Simple to implement and interpret
99
+ - Fast and effective on linearly separable data
100
+ - Not ideal for complex or non-linear relationships
101
+ πŸ‘‰ Use regularization and diagnostic plots to validate model quality.
102
+ """)