File size: 24,272 Bytes
6491927
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
{% extends "layout.html" %}

{% block content %}
<script src="https://cdn.tailwindcss.com"></script>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async

        src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
</script>

    <h1 class="text-3xl font-bold text-center text-gray-800 mb-4">Linear Regression Explained</h1>
    <p class="text-gray-600 text-center mb-6">
        Understand the fundamentals of Linear Regression, its computational flow, and how it makes predictions.
    </p>

    <div class="flex flex-col md:flex-row gap-8">
        <div class="flex-1">
            <div class="explanation-box">
                <h3 class="text-xl font-semibold text-gray-700 mb-3">What is Linear Regression?</h3>
                <p class="text-gray-600 mb-4">
                    Linear Regression is a fundamental supervised learning algorithm used for predicting a continuous outcome variable (dependent variable) based on one or more input features (independent variables). It models the relationship between the variables by fitting a linear equation to the observed data.
                </p>
                <p class="text-gray-600 mb-4">
                    For a simple linear regression with one input feature, the equation used by our model is:
                    <br> Predicted Score = (20 x Hours Studied) + 15
                </p>
                <ul class="list-disc list-inside text-gray-600 mb-4">
                    <li>Slope (m=20): Represents how much the predicted outcome changes for every one-unit increase in the input feature. It indicates the strength and direction of the relationship.</li>
                    <li>Intercept (b=15): Represents the predicted outcome when all input features are zero. It's the baseline value.</li>
                </ul>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Why Slope (m) is 20</h3>
                <p class="text-gray-600 mb-4">
                    The slope of 20 means each hour of studying contributes 20 points to your exam score. For example, if you study one more hour, your predicted score increases by 20 points.
                </p>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Why Intercept (b) is 15</h3>
                <p class="text-gray-600 mb-4">
                    The intercept of 15 represents points earned regardless of study time. This could account for:
                </p>
                <ul class="list-disc list-inside text-gray-600 mb-4">
                    <li>Class attendance and participation</li>
                    <li>Homework assignments</li>
                    <li>Quizzes and in-class activities</li>
                    <li>Base marks for attempting the exam</li>
                </ul>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Computational Flow (Input to Output):</h3>
                <p class="text-gray-600 mb-4">
                    The following steps illustrate how our Linear Regression model makes predictions:
                </p>
                <ol class="list-decimal list-inside text-gray-600 mb-4">
                    <li><strong>Input Data:</strong> You (the user) provide a value for 'Hours Studied'.</li>
                    <li><strong>Load Model:</strong> The Flask application loads the pre-trained `supervised_model.pkl`. This model contains the learned parameters: a slope (m) of 20 and an intercept (b) of 15.</li>
                    <li><strong>Calculate:</strong> The model computes the predicted score using its linear equation:
                        <p class="font-mono text-sm text-gray-700 my-2 pl-4">
                            <code>Predicted Score = (20 * Input Hours) + 15</code>
                        </p>
                        This is a simple multiplication and addition operation.
                    </li>
                    <li><strong>Result:</strong> The calculated 'Predicted Score' is returned by the model.</li>
                    <li><strong>Display:</strong> The Flask application then renders this predicted score on the web page for you to see.</li>
                </ol>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Our Training Data:</h3>
                <p class="text-gray-600 mb-2">
                    The model was trained on the following data points to learn the relationship between 'Hours Studied' and 'Score' using the equation `Score = 20 * Hours + 15`:
                </p>
                <div class="overflow-x-auto mb-4">
                    <table class="min-w-full bg-white rounded-lg shadow-md overflow-hidden text-gray-700">
                        <thead>
                            <tr class="bg-gray-100 border-b border-gray-200">
                                <th class="py-3 px-4 text-left font-semibold">Hours Studied X</th>
                                <th class="py-3 px-4 text-left font-semibold">Score Y</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr class="border-b border-gray-100">
                                <td class="py-3 px-4">1</td>
                                <td class="py-3 px-4">35</td>
                            </tr>
                            <tr class="border-b border-gray-100">
                                <td class="py-3 px-4">2</td>
                                <td class="py-3 px-4">55</td>
                            </tr>
                            <tr class="border-b border-gray-100">
                                <td class="py-3 px-4">3</td>
                                <td class="py-3 px-4">75</td>
                            </tr>
                            <tr class="border-b border-gray-100">
                                <td class="py-3 px-4">4</td>
                                <td class="py-3 px-4">95</td>
                            </tr>
                            <tr>
                                <td class="py-3 px-4">5</td>
                                <td class="py-3 px-4">115</td>
                            </tr>
                        </tbody>
                    </table>
                </div>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Cost Function Quantifying Error</h3>
                <p class="text-gray-600 mb-4">
                    When a linear regression model is being trained, it doesn't just randomly draw a line. It evaluates how good its current line is by using a Cost Function. The goal of training is to find the line i.e. the specific m and b values that minimizes this cost.
                </p>
                <p class="text-gray-600 mb-4">
                    A common cost function for linear regression is the Mean Squared Error MSE. It calculates the average of the squared differences between the actual observed values $y_i$ and the values predicted by the model $\hat{y_i}$.
                </p>
 <p class="text-center my-4 text-lg font-semibold text-gray-700">
    Mean Squared Error (MSE) Formula:
</p>

<p class="text-center my-2 text-base">
    \[
    \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
    \]
</p>
              <ul class="list-disc list-inside text-gray-600 mb-4">
  <li><strong>N</strong>: The total number of data points.</li>
  <li>\( y_i \): The actual score for data point <em>i</em>.</li>
  <li>\( \hat{y}_i \): The predicted score for data point <em>i</em>, calculated as \( m \times x_i + b \).</li>
</ul>
                <p class="text-gray-600 mb-4">
                    Squaring the differences ensures that all errors are positive and penalizes larger errors more heavily. The model continuously adjusts its m and b to make this MSE value as small as possible.
                </p>

                <h3 class="text-xl font-semibold text-gray-700 mb-3">Gradient Descent Learning the Best Line</h3>
                <p class="text-gray-600 mb-4">
                    Gradient Descent is an optimization algorithm used by linear regression and many other machine learning models to find the values of m and b that minimize the cost function like MSE. Imagine the cost function as a landscape with hills and valleys, and the goal is to find the lowest point (the minimum cost).
                </p>
                <ol class="list-decimal list-inside text-gray-600 mb-4">
                    <li><strong>Start Randomly:</strong> The algorithm starts with some initial, often random, values for m and b.</li>
                    <li><strong>Calculate Gradient:</strong> It calculates the gradient of the cost function with respect to m and b. The gradient is like a vector that points in the direction of the steepest ascent on the cost landscape.</li>
                    <li><strong>Take a Step:</strong> To minimize the cost, the algorithm takes a small step in the opposite direction of the gradient (downhill). The size of this step is controlled by a parameter called the learning rate.</li>
                    <li><strong>Repeat:</strong> Steps 2 and 3 are repeated iteratively, with m and b being updated in each iteration. With each step, the model gets closer to the optimal m and b values that minimize the cost.</li>
                    <li><strong>Convergence:</strong> This process continues until the algorithm converges, meaning the cost function stops decreasing significantly, indicating it has found the minimum or a very good approximation of it.</li>
                </ol>
                <p class="text-gray-600 mb-4">
                    So, when `model.fit(X, y)` is called, behind the scenes, an optimization algorithm like Gradient Descent is tirelessly working to find the m and b that best fit your data by minimizing the prediction errors.
                </p>
            </div>
        </div>

        <div class="flex-1 flex flex-col gap-6">
            <div class="bg-white rounded-lg shadow-md p-6">
                <h3 class="text-xl font-semibold text-gray-700 mb-4">Visualizing the Regression Line</h3>
                <canvas id="regressionCanvas" width="400" height="300" class="border border-gray-300 rounded-md"></canvas>
                <p class="text-sm text-gray-600 mt-2">
                    Slope (m): <span id="slopeValue"></span>, Intercept (b): <span id="interceptValue"></span>
                </p>
            </div>

            <div class="p-6 bg-gray-50 rounded-xl shadow-inner">
                <h3 class="text-xl font-semibold text-gray-700 mb-4">Make a Prediction:</h3>
                <form method="POST" class="flex flex-col sm:flex-row items-center gap-4">
                    <label for="hoursInput" class="text-gray-700 font-medium">Hours Studied:</label>
                    <input type="number" id="hoursInput" name="hours" min="0" step="0.1"

                           value="{{ hours_studied_input if hours_studied_input is not none else '3.5' }}"

                           required class="flex-grow"

                           style="border: 1px solid #d1d5db; border-radius: 0.5rem; padding: 0.75rem 1rem; font-size: 1rem; width: 100%; max-width: 200px; transition: border-color 0.2s;">
                    <button type="submit" id="predictBtn"

                            style="background-color: #3b82f6; color: white; padding: 0.75rem 1.5rem; border-radius: 0.5rem; font-weight: 600; transition: background-color 0.2s, transform 0.1s; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);">
                        Predict Score
                    </button>
                </form>
            </div>

            <div id="predictionOutput" class="prediction-box {% if prediction is none %}hidden{% endif %}"

                 style="background-color: #e0f2fe; border: 1px solid #93c5fd; border-radius: 0.75rem; padding: 1.5rem; text-align: center;">
                <h3 class="text-2xl font-bold text-blue-700 mb-2">Predicted Score:</h3>
                <p class="text-4xl font-extrabold text-blue-900" id="predictedScore">
                    {% if prediction is not none %}
                        {{ prediction | round(2) }}
                    {% else %}
                        --.--
                    {% endif %}
                </p>
                <p class="text-sm text-gray-600 mt-2">
                    This is the score predicted by the linear regression model for the hours you entered.
                </p>
            </div>
        </div>
    </div>

    <script>

        // Get canvas and context

        const canvas = document.getElementById('regressionCanvas');

        const ctx = canvas.getContext('2d');



        // Data from your Python script (X, y)

        // Updated to match y = 20x + 15 model

        const X_data = [1, 2, 3, 4, 5];

        const y_data = [35, 55, 75, 95, 115]; 



        // --- Understanding Slope (m) and Intercept (b) ---

        // These values are now hardcoded to match the model in your screenshots (m=20, b=15)

        const slope = 20; 

        const intercept = 15;



        // Display slope and intercept values in the HTML

        document.getElementById('slopeValue').textContent = slope.toFixed(2);

        document.getElementById('interceptValue').textContent = intercept.toFixed(2);



        // Canvas dimensions and padding

        let canvasWidth, canvasHeight;

        const padding = 50;



        // Scale factors for drawing data onto the canvas

        let xScale, yScale;

        let xMin, xMax, yMin, yMax;



        // Prediction variables (these will be updated when the user inputs hours)

        let predictedHours = null;

        let predictedScore = null;



        // Function to set up scaling based on data range and canvas size

        function setupScaling() {

            canvasWidth = canvas.width;

            canvasHeight = canvas.height;



            // Determine data ranges for X and Y axes

            xMin = Math.min(...X_data, 0); // Always start X-axis at 0

            // Set xMax to at least 10 (as per the last request) and ensure it covers any new predicted hours

            xMax = Math.max(...X_data, predictedHours !== null ? predictedHours : 0, 10) + 1; // Extend x-axis slightly beyond 10



            yMin = Math.min(...y_data, 0); // Always start Y-axis at 0

            // Calculate the predicted score for the determined xMax to ensure the y-axis covers the line

            const maxPredictedY = slope * xMax + intercept;

            yMax = Math.max(...y_data, predictedScore !== null ? predictedScore : 0, maxPredictedY) + 20; // Extend y-axis slightly beyond max needed



            // Calculate scaling factors to fit data within the canvas padding

            xScale = (canvasWidth - 2 * padding) / (xMax - xMin);

            yScale = (canvasHeight - 2 * padding) / (yMax - yMin);

        }



        // Convert data coordinates (e.g., hours, score) to canvas pixel coordinates

        function toCanvasX(x) {

            return padding + (x - xMin) * xScale;

        }



        function toCanvasY(y) {

            return canvasHeight - padding - (y - yMin) * yScale;

        }



        // Function to draw the entire graph, including data points, regression line, and predictions

        function drawGraph() {

            ctx.clearRect(0, 0, canvasWidth, canvasHeight); // Clear the entire canvas



            // Draw axes

            ctx.beginPath();

            ctx.strokeStyle = '#64748b'; // Slate gray for axes

            ctx.lineWidth = 2;



            // X-axis (horizontal line)

            ctx.moveTo(padding, toCanvasY(yMin));

            ctx.lineTo(canvasWidth - padding, toCanvasY(yMin));

            // Y-axis (vertical line)

            ctx.moveTo(toCanvasX(xMin), padding);

            ctx.lineTo(toCanvasX(xMin), canvasHeight - padding);

            ctx.stroke();



            // Draw axis labels and ticks

            ctx.fillStyle = '#475569'; // Darker gray for labels

            ctx.font = '14px Inter';

            ctx.textAlign = 'center';

            ctx.textBaseline = 'top';



            // X-axis labels (Hours Studied)

            // Dynamic tick step for clarity on different scales

            const xTickStep = 1; // Every 1 hour for a graph up to 10

            for (let i = Math.ceil(xMin / xTickStep) * xTickStep; i <= Math.floor(xMax); i += xTickStep) {

                if (i >= 0) {

                    ctx.fillText(i + 'h', toCanvasX(i), canvasHeight - padding + 10);

                    ctx.beginPath();

                    ctx.moveTo(toCanvasX(i), canvasHeight - padding);

                    ctx.lineTo(toCanvasX(i), canvasHeight - padding - 5);

                    ctx.stroke();

                }

            }

            // X-axis title

            ctx.fillText('Hours Studied', canvasWidth / 2, canvasHeight - 20);



            ctx.textAlign = 'right';

            ctx.textBaseline = 'middle';

            // Y-axis labels (Score)

            // Dynamic tick step for clarity on different scales

            const yTickStep = (yMax - yMin) / 10 > 20 ? 50 : 20; // Example: every 20 or 50 points

            for (let i = Math.ceil(yMin / yTickStep) * yTickStep; i <= Math.floor(yMax); i += yTickStep) {

                if (i >= 0) {

                    ctx.fillText(i.toFixed(0), padding - 10, toCanvasY(i));

                    ctx.beginPath();

                    ctx.moveTo(padding, toCanvasY(i));

                    ctx.lineTo(padding + 5, toCanvasY(i));

                    ctx.stroke();

                }

            }

            // Y-axis title (rotated)

            ctx.save();

            ctx.translate(20, canvasHeight / 2);

            ctx.rotate(-Math.PI / 2);

            ctx.textAlign = 'center';

            ctx.fillText('Score', 0, 0);

            ctx.restore();





            // Draw data points (blue circles)

            ctx.fillStyle = '#3b82f6'; // Blue for data points

            X_data.forEach((x, i) => {

                ctx.beginPath();

                ctx.arc(toCanvasX(x), toCanvasY(y_data [i]), 5, 0, Math.PI * 2); // Radius 5

                ctx.fill();

            });



            // Draw regression line (red line)

            ctx.beginPath();

            ctx.strokeStyle = '#ef4444'; // Red for regression line

            ctx.lineWidth = 3;

            // Draw line across the entire X-axis range based on the model equation

            ctx.moveTo(toCanvasX(xMin), toCanvasY(slope * xMin + intercept));

            ctx.lineTo(toCanvasX(xMax), toCanvasY(slope * xMax + intercept));

            ctx.stroke();



            // Draw predicted point and lines if available (green point and dashed lines)

            if (predictedHours !== null && predictedScore !== null) {

                const predX = toCanvasX(predictedHours);

                const predY = toCanvasY(predictedScore);



                // Predicted point

                ctx.fillStyle = '#22c55e'; // Green for predicted point

                ctx.beginPath();

                ctx.arc(predX, predY, 6, 0, Math.PI * 2); // Slightly larger radius

                ctx.fill();



                // Dotted lines to axes

                ctx.strokeStyle = '#22c55e'; // Green for dotted lines

                ctx.lineWidth = 1.5;

                ctx.setLineDash([5, 5]); // Dotted line style



                // Line from predicted point to X-axis

                ctx.beginPath();

                ctx.moveTo(predX, predY);

                ctx.lineTo(predX, toCanvasY(yMin));

                ctx.stroke();



                // Line from predicted point to Y-axis

                ctx.beginPath();

                ctx.moveTo(predX, predY);

                ctx.lineTo(toCanvasX(xMin), predY);

                ctx.stroke();



                ctx.setLineDash([]); // Reset line dash to solid for subsequent drawings

            }

        }



        // Event listener for the "Predict Score" button click

        document.getElementById('predictBtn').addEventListener('click', () => {

            // Get the value from the input field and parse it as a floating-point number

            const hoursInput = parseFloat(document.getElementById('hoursInput').value);



            // Check if the input is a valid number

            if (!isNaN(hoursInput)) {

                // Update global prediction variables

                predictedHours = hoursInput;

                predictedScore = slope * predictedHours + intercept;



                // Display the predicted score in the HTML

                document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);

                // Make the prediction output box visible

                document.getElementById('predictionOutput').classList.remove('hidden');



                // Recalculate scaling and redraw the graph to accommodate new prediction if it extends axes

                setupScaling();

                drawGraph();

            } else {

                // If input is invalid, display an error message

                const outputDiv = document.getElementById('predictionOutput');

                outputDiv.innerHTML = '<p class="text-red-600">Please enter a valid number for hours studied.</p>';

                outputDiv.classList.remove('hidden');

            }

        });



        // Function to handle canvas resizing and redraw the graph

        function resizeCanvas() {

            // Get the device pixel ratio for sharper rendering on high-DPI screens

            const dpi = window.devicePixelRatio;

            // Get the actual rendered size of the canvas element from its CSS styles

            const rect = canvas.getBoundingClientRect();



            // Set the internal drawing buffer size of the canvas

            canvas.width = rect.width * dpi;

            canvas.height = rect.height * dpi;



            // Scale the drawing context to match the DPI, ensuring crisp lines and text

            ctx.scale(dpi, dpi);



            // Re-setup scaling for data to canvas coordinates and redraw

            setupScaling();

            drawGraph();

        }



        // Initial setup and draw when the window loads

        window.addEventListener('load', () => {

            resizeCanvas(); // Set initial canvas size and draw

            // Also trigger an initial prediction for the default value in the input field

            const initialHours = parseFloat(document.getElementById('hoursInput').value);

            if (!isNaN(initialHours)) {

                predictedHours = initialHours;

                predictedScore = slope * initialHours + intercept;

                document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);

                document.getElementById('predictionOutput').classList.remove('hidden');

                setupScaling();

                drawGraph();

            }

        });



        // Redraw the graph whenever the window is resized

        window.addEventListener('resize', resizeCanvas);



        // Optional: Allow clicking on canvas to set hours input (for quick testing)

        canvas.addEventListener('click', (event) => {

            // Get mouse click coordinates relative to the canvas

            const rect = canvas.getBoundingClientRect();

            const mouseX = (event.clientX - rect.left) / (canvas.width / canvas.getBoundingClientRect().width);

            const mouseY = (event.clientY - rect.top) / (canvas.height / canvas.getBoundingClientRect().height);



            // Convert canvas X coordinate back to data X (hours studied)

            const clickedHours = xMin + (mouseX - padding) / xScale;

            // Update the input field with the clicked hours

            document.getElementById('hoursInput').value = clickedHours.toFixed(1);

            // Trigger the prediction immediately

            document.getElementById('predictBtn').click();

        });

    </script>
{% endblock %}