|
|
{% extends "layout.html" %}
|
|
|
|
|
|
{% block content %}
|
|
|
<script src="https://cdn.tailwindcss.com"></script>
|
|
|
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
|
|
<script id="MathJax-script" async
|
|
|
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
|
|
|
</script>
|
|
|
|
|
|
<h1 class="text-3xl font-bold text-center text-gray-800 mb-4">Linear Regression Explained</h1>
|
|
|
<p class="text-gray-600 text-center mb-6">
|
|
|
Understand the fundamentals of Linear Regression, its computational flow, and how it makes predictions.
|
|
|
</p>
|
|
|
|
|
|
<div class="flex flex-col md:flex-row gap-8">
|
|
|
<div class="flex-1">
|
|
|
<div class="explanation-box">
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">What is Linear Regression?</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
Linear Regression is a fundamental supervised learning algorithm used for predicting a continuous outcome variable (dependent variable) based on one or more input features (independent variables). It models the relationship between the variables by fitting a linear equation to the observed data.
|
|
|
</p>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
For a simple linear regression with one input feature, the equation used by our model is:
|
|
|
<br> Predicted Score = (20 x Hours Studied) + 15
|
|
|
</p>
|
|
|
<ul class="list-disc list-inside text-gray-600 mb-4">
|
|
|
<li>Slope (m=20): Represents how much the predicted outcome changes for every one-unit increase in the input feature. It indicates the strength and direction of the relationship.</li>
|
|
|
<li>Intercept (b=15): Represents the predicted outcome when all input features are zero. It's the baseline value.</li>
|
|
|
</ul>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Why Slope (m) is 20</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
The slope of 20 means each hour of studying contributes 20 points to your exam score. For example, if you study one more hour, your predicted score increases by 20 points.
|
|
|
</p>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Why Intercept (b) is 15</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
The intercept of 15 represents points earned regardless of study time. This could account for:
|
|
|
</p>
|
|
|
<ul class="list-disc list-inside text-gray-600 mb-4">
|
|
|
<li>Class attendance and participation</li>
|
|
|
<li>Homework assignments</li>
|
|
|
<li>Quizzes and in-class activities</li>
|
|
|
<li>Base marks for attempting the exam</li>
|
|
|
</ul>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Computational Flow (Input to Output):</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
The following steps illustrate how our Linear Regression model makes predictions:
|
|
|
</p>
|
|
|
<ol class="list-decimal list-inside text-gray-600 mb-4">
|
|
|
<li><strong>Input Data:</strong> You (the user) provide a value for 'Hours Studied'.</li>
|
|
|
<li><strong>Load Model:</strong> The Flask application loads the pre-trained `supervised_model.pkl`. This model contains the learned parameters: a slope (m) of 20 and an intercept (b) of 15.</li>
|
|
|
<li><strong>Calculate:</strong> The model computes the predicted score using its linear equation:
|
|
|
<p class="font-mono text-sm text-gray-700 my-2 pl-4">
|
|
|
<code>Predicted Score = (20 * Input Hours) + 15</code>
|
|
|
</p>
|
|
|
This is a simple multiplication and addition operation.
|
|
|
</li>
|
|
|
<li><strong>Result:</strong> The calculated 'Predicted Score' is returned by the model.</li>
|
|
|
<li><strong>Display:</strong> The Flask application then renders this predicted score on the web page for you to see.</li>
|
|
|
</ol>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Our Training Data:</h3>
|
|
|
<p class="text-gray-600 mb-2">
|
|
|
The model was trained on the following data points to learn the relationship between 'Hours Studied' and 'Score' using the equation `Score = 20 * Hours + 15`:
|
|
|
</p>
|
|
|
<div class="overflow-x-auto mb-4">
|
|
|
<table class="min-w-full bg-white rounded-lg shadow-md overflow-hidden text-gray-700">
|
|
|
<thead>
|
|
|
<tr class="bg-gray-100 border-b border-gray-200">
|
|
|
<th class="py-3 px-4 text-left font-semibold">Hours Studied X</th>
|
|
|
<th class="py-3 px-4 text-left font-semibold">Score Y</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr class="border-b border-gray-100">
|
|
|
<td class="py-3 px-4">1</td>
|
|
|
<td class="py-3 px-4">35</td>
|
|
|
</tr>
|
|
|
<tr class="border-b border-gray-100">
|
|
|
<td class="py-3 px-4">2</td>
|
|
|
<td class="py-3 px-4">55</td>
|
|
|
</tr>
|
|
|
<tr class="border-b border-gray-100">
|
|
|
<td class="py-3 px-4">3</td>
|
|
|
<td class="py-3 px-4">75</td>
|
|
|
</tr>
|
|
|
<tr class="border-b border-gray-100">
|
|
|
<td class="py-3 px-4">4</td>
|
|
|
<td class="py-3 px-4">95</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td class="py-3 px-4">5</td>
|
|
|
<td class="py-3 px-4">115</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
</div>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Cost Function Quantifying Error</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
When a linear regression model is being trained, it doesn't just randomly draw a line. It evaluates how good its current line is by using a Cost Function. The goal of training is to find the line i.e. the specific m and b values that minimizes this cost.
|
|
|
</p>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
A common cost function for linear regression is the Mean Squared Error MSE. It calculates the average of the squared differences between the actual observed values $y_i$ and the values predicted by the model $\hat{y_i}$.
|
|
|
</p>
|
|
|
<p class="text-center my-4 text-lg font-semibold text-gray-700">
|
|
|
Mean Squared Error (MSE) Formula:
|
|
|
</p>
|
|
|
|
|
|
<p class="text-center my-2 text-base">
|
|
|
\[
|
|
|
\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
|
|
|
\]
|
|
|
</p>
|
|
|
<ul class="list-disc list-inside text-gray-600 mb-4">
|
|
|
<li><strong>N</strong>: The total number of data points.</li>
|
|
|
<li>\( y_i \): The actual score for data point <em>i</em>.</li>
|
|
|
<li>\( \hat{y}_i \): The predicted score for data point <em>i</em>, calculated as \( m \times x_i + b \).</li>
|
|
|
</ul>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
Squaring the differences ensures that all errors are positive and penalizes larger errors more heavily. The model continuously adjusts its m and b to make this MSE value as small as possible.
|
|
|
</p>
|
|
|
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-3">Gradient Descent Learning the Best Line</h3>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
Gradient Descent is an optimization algorithm used by linear regression and many other machine learning models to find the values of m and b that minimize the cost function like MSE. Imagine the cost function as a landscape with hills and valleys, and the goal is to find the lowest point (the minimum cost).
|
|
|
</p>
|
|
|
<ol class="list-decimal list-inside text-gray-600 mb-4">
|
|
|
<li><strong>Start Randomly:</strong> The algorithm starts with some initial, often random, values for m and b.</li>
|
|
|
<li><strong>Calculate Gradient:</strong> It calculates the gradient of the cost function with respect to m and b. The gradient is like a vector that points in the direction of the steepest ascent on the cost landscape.</li>
|
|
|
<li><strong>Take a Step:</strong> To minimize the cost, the algorithm takes a small step in the opposite direction of the gradient (downhill). The size of this step is controlled by a parameter called the learning rate.</li>
|
|
|
<li><strong>Repeat:</strong> Steps 2 and 3 are repeated iteratively, with m and b being updated in each iteration. With each step, the model gets closer to the optimal m and b values that minimize the cost.</li>
|
|
|
<li><strong>Convergence:</strong> This process continues until the algorithm converges, meaning the cost function stops decreasing significantly, indicating it has found the minimum or a very good approximation of it.</li>
|
|
|
</ol>
|
|
|
<p class="text-gray-600 mb-4">
|
|
|
So, when `model.fit(X, y)` is called, behind the scenes, an optimization algorithm like Gradient Descent is tirelessly working to find the m and b that best fit your data by minimizing the prediction errors.
|
|
|
</p>
|
|
|
</div>
|
|
|
</div>
|
|
|
|
|
|
<div class="flex-1 flex flex-col gap-6">
|
|
|
<div class="bg-white rounded-lg shadow-md p-6">
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-4">Visualizing the Regression Line</h3>
|
|
|
<canvas id="regressionCanvas" width="400" height="300" class="border border-gray-300 rounded-md"></canvas>
|
|
|
<p class="text-sm text-gray-600 mt-2">
|
|
|
Slope (m): <span id="slopeValue"></span>, Intercept (b): <span id="interceptValue"></span>
|
|
|
</p>
|
|
|
</div>
|
|
|
|
|
|
<div class="p-6 bg-gray-50 rounded-xl shadow-inner">
|
|
|
<h3 class="text-xl font-semibold text-gray-700 mb-4">Make a Prediction:</h3>
|
|
|
<form method="POST" class="flex flex-col sm:flex-row items-center gap-4">
|
|
|
<label for="hoursInput" class="text-gray-700 font-medium">Hours Studied:</label>
|
|
|
<input type="number" id="hoursInput" name="hours" min="0" step="0.1"
|
|
|
value="{{ hours_studied_input if hours_studied_input is not none else '3.5' }}"
|
|
|
required class="flex-grow"
|
|
|
style="border: 1px solid #d1d5db; border-radius: 0.5rem; padding: 0.75rem 1rem; font-size: 1rem; width: 100%; max-width: 200px; transition: border-color 0.2s;">
|
|
|
<button type="submit" id="predictBtn"
|
|
|
style="background-color: #3b82f6; color: white; padding: 0.75rem 1.5rem; border-radius: 0.5rem; font-weight: 600; transition: background-color 0.2s, transform 0.1s; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);">
|
|
|
Predict Score
|
|
|
</button>
|
|
|
</form>
|
|
|
</div>
|
|
|
|
|
|
<div id="predictionOutput" class="prediction-box {% if prediction is none %}hidden{% endif %}"
|
|
|
style="background-color: #e0f2fe; border: 1px solid #93c5fd; border-radius: 0.75rem; padding: 1.5rem; text-align: center;">
|
|
|
<h3 class="text-2xl font-bold text-blue-700 mb-2">Predicted Score:</h3>
|
|
|
<p class="text-4xl font-extrabold text-blue-900" id="predictedScore">
|
|
|
{% if prediction is not none %}
|
|
|
{{ prediction | round(2) }}
|
|
|
{% else %}
|
|
|
--.--
|
|
|
{% endif %}
|
|
|
</p>
|
|
|
<p class="text-sm text-gray-600 mt-2">
|
|
|
This is the score predicted by the linear regression model for the hours you entered.
|
|
|
</p>
|
|
|
</div>
|
|
|
</div>
|
|
|
</div>
|
|
|
|
|
|
<script>
|
|
|
|
|
|
const canvas = document.getElementById('regressionCanvas');
|
|
|
const ctx = canvas.getContext('2d');
|
|
|
|
|
|
|
|
|
|
|
|
const X_data = [1, 2, 3, 4, 5];
|
|
|
const y_data = [35, 55, 75, 95, 115];
|
|
|
|
|
|
|
|
|
|
|
|
const slope = 20;
|
|
|
const intercept = 15;
|
|
|
|
|
|
|
|
|
document.getElementById('slopeValue').textContent = slope.toFixed(2);
|
|
|
document.getElementById('interceptValue').textContent = intercept.toFixed(2);
|
|
|
|
|
|
|
|
|
let canvasWidth, canvasHeight;
|
|
|
const padding = 50;
|
|
|
|
|
|
|
|
|
let xScale, yScale;
|
|
|
let xMin, xMax, yMin, yMax;
|
|
|
|
|
|
|
|
|
let predictedHours = null;
|
|
|
let predictedScore = null;
|
|
|
|
|
|
|
|
|
function setupScaling() {
|
|
|
canvasWidth = canvas.width;
|
|
|
canvasHeight = canvas.height;
|
|
|
|
|
|
|
|
|
xMin = Math.min(...X_data, 0);
|
|
|
|
|
|
xMax = Math.max(...X_data, predictedHours !== null ? predictedHours : 0, 10) + 1;
|
|
|
|
|
|
yMin = Math.min(...y_data, 0);
|
|
|
|
|
|
const maxPredictedY = slope * xMax + intercept;
|
|
|
yMax = Math.max(...y_data, predictedScore !== null ? predictedScore : 0, maxPredictedY) + 20;
|
|
|
|
|
|
|
|
|
xScale = (canvasWidth - 2 * padding) / (xMax - xMin);
|
|
|
yScale = (canvasHeight - 2 * padding) / (yMax - yMin);
|
|
|
}
|
|
|
|
|
|
|
|
|
function toCanvasX(x) {
|
|
|
return padding + (x - xMin) * xScale;
|
|
|
}
|
|
|
|
|
|
function toCanvasY(y) {
|
|
|
return canvasHeight - padding - (y - yMin) * yScale;
|
|
|
}
|
|
|
|
|
|
|
|
|
function drawGraph() {
|
|
|
ctx.clearRect(0, 0, canvasWidth, canvasHeight);
|
|
|
|
|
|
|
|
|
ctx.beginPath();
|
|
|
ctx.strokeStyle = '#64748b';
|
|
|
ctx.lineWidth = 2;
|
|
|
|
|
|
|
|
|
ctx.moveTo(padding, toCanvasY(yMin));
|
|
|
ctx.lineTo(canvasWidth - padding, toCanvasY(yMin));
|
|
|
|
|
|
ctx.moveTo(toCanvasX(xMin), padding);
|
|
|
ctx.lineTo(toCanvasX(xMin), canvasHeight - padding);
|
|
|
ctx.stroke();
|
|
|
|
|
|
|
|
|
ctx.fillStyle = '#475569';
|
|
|
ctx.font = '14px Inter';
|
|
|
ctx.textAlign = 'center';
|
|
|
ctx.textBaseline = 'top';
|
|
|
|
|
|
|
|
|
|
|
|
const xTickStep = 1;
|
|
|
for (let i = Math.ceil(xMin / xTickStep) * xTickStep; i <= Math.floor(xMax); i += xTickStep) {
|
|
|
if (i >= 0) {
|
|
|
ctx.fillText(i + 'h', toCanvasX(i), canvasHeight - padding + 10);
|
|
|
ctx.beginPath();
|
|
|
ctx.moveTo(toCanvasX(i), canvasHeight - padding);
|
|
|
ctx.lineTo(toCanvasX(i), canvasHeight - padding - 5);
|
|
|
ctx.stroke();
|
|
|
}
|
|
|
}
|
|
|
|
|
|
ctx.fillText('Hours Studied', canvasWidth / 2, canvasHeight - 20);
|
|
|
|
|
|
ctx.textAlign = 'right';
|
|
|
ctx.textBaseline = 'middle';
|
|
|
|
|
|
|
|
|
const yTickStep = (yMax - yMin) / 10 > 20 ? 50 : 20;
|
|
|
for (let i = Math.ceil(yMin / yTickStep) * yTickStep; i <= Math.floor(yMax); i += yTickStep) {
|
|
|
if (i >= 0) {
|
|
|
ctx.fillText(i.toFixed(0), padding - 10, toCanvasY(i));
|
|
|
ctx.beginPath();
|
|
|
ctx.moveTo(padding, toCanvasY(i));
|
|
|
ctx.lineTo(padding + 5, toCanvasY(i));
|
|
|
ctx.stroke();
|
|
|
}
|
|
|
}
|
|
|
|
|
|
ctx.save();
|
|
|
ctx.translate(20, canvasHeight / 2);
|
|
|
ctx.rotate(-Math.PI / 2);
|
|
|
ctx.textAlign = 'center';
|
|
|
ctx.fillText('Score', 0, 0);
|
|
|
ctx.restore();
|
|
|
|
|
|
|
|
|
|
|
|
ctx.fillStyle = '#3b82f6';
|
|
|
X_data.forEach((x, i) => {
|
|
|
ctx.beginPath();
|
|
|
ctx.arc(toCanvasX(x), toCanvasY(y_data [i]), 5, 0, Math.PI * 2);
|
|
|
ctx.fill();
|
|
|
});
|
|
|
|
|
|
|
|
|
ctx.beginPath();
|
|
|
ctx.strokeStyle = '#ef4444';
|
|
|
ctx.lineWidth = 3;
|
|
|
|
|
|
ctx.moveTo(toCanvasX(xMin), toCanvasY(slope * xMin + intercept));
|
|
|
ctx.lineTo(toCanvasX(xMax), toCanvasY(slope * xMax + intercept));
|
|
|
ctx.stroke();
|
|
|
|
|
|
|
|
|
if (predictedHours !== null && predictedScore !== null) {
|
|
|
const predX = toCanvasX(predictedHours);
|
|
|
const predY = toCanvasY(predictedScore);
|
|
|
|
|
|
|
|
|
ctx.fillStyle = '#22c55e';
|
|
|
ctx.beginPath();
|
|
|
ctx.arc(predX, predY, 6, 0, Math.PI * 2);
|
|
|
ctx.fill();
|
|
|
|
|
|
|
|
|
ctx.strokeStyle = '#22c55e';
|
|
|
ctx.lineWidth = 1.5;
|
|
|
ctx.setLineDash([5, 5]);
|
|
|
|
|
|
|
|
|
ctx.beginPath();
|
|
|
ctx.moveTo(predX, predY);
|
|
|
ctx.lineTo(predX, toCanvasY(yMin));
|
|
|
ctx.stroke();
|
|
|
|
|
|
|
|
|
ctx.beginPath();
|
|
|
ctx.moveTo(predX, predY);
|
|
|
ctx.lineTo(toCanvasX(xMin), predY);
|
|
|
ctx.stroke();
|
|
|
|
|
|
ctx.setLineDash([]);
|
|
|
}
|
|
|
}
|
|
|
|
|
|
|
|
|
document.getElementById('predictBtn').addEventListener('click', () => {
|
|
|
|
|
|
const hoursInput = parseFloat(document.getElementById('hoursInput').value);
|
|
|
|
|
|
|
|
|
if (!isNaN(hoursInput)) {
|
|
|
|
|
|
predictedHours = hoursInput;
|
|
|
predictedScore = slope * predictedHours + intercept;
|
|
|
|
|
|
|
|
|
document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);
|
|
|
|
|
|
document.getElementById('predictionOutput').classList.remove('hidden');
|
|
|
|
|
|
|
|
|
setupScaling();
|
|
|
drawGraph();
|
|
|
} else {
|
|
|
|
|
|
const outputDiv = document.getElementById('predictionOutput');
|
|
|
outputDiv.innerHTML = '<p class="text-red-600">Please enter a valid number for hours studied.</p>';
|
|
|
outputDiv.classList.remove('hidden');
|
|
|
}
|
|
|
});
|
|
|
|
|
|
|
|
|
function resizeCanvas() {
|
|
|
|
|
|
const dpi = window.devicePixelRatio;
|
|
|
|
|
|
const rect = canvas.getBoundingClientRect();
|
|
|
|
|
|
|
|
|
canvas.width = rect.width * dpi;
|
|
|
canvas.height = rect.height * dpi;
|
|
|
|
|
|
|
|
|
ctx.scale(dpi, dpi);
|
|
|
|
|
|
|
|
|
setupScaling();
|
|
|
drawGraph();
|
|
|
}
|
|
|
|
|
|
|
|
|
window.addEventListener('load', () => {
|
|
|
resizeCanvas();
|
|
|
|
|
|
const initialHours = parseFloat(document.getElementById('hoursInput').value);
|
|
|
if (!isNaN(initialHours)) {
|
|
|
predictedHours = initialHours;
|
|
|
predictedScore = slope * initialHours + intercept;
|
|
|
document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);
|
|
|
document.getElementById('predictionOutput').classList.remove('hidden');
|
|
|
setupScaling();
|
|
|
drawGraph();
|
|
|
}
|
|
|
});
|
|
|
|
|
|
|
|
|
window.addEventListener('resize', resizeCanvas);
|
|
|
|
|
|
|
|
|
canvas.addEventListener('click', (event) => {
|
|
|
|
|
|
const rect = canvas.getBoundingClientRect();
|
|
|
const mouseX = (event.clientX - rect.left) / (canvas.width / canvas.getBoundingClientRect().width);
|
|
|
const mouseY = (event.clientY - rect.top) / (canvas.height / canvas.getBoundingClientRect().height);
|
|
|
|
|
|
|
|
|
const clickedHours = xMin + (mouseX - padding) / xScale;
|
|
|
|
|
|
document.getElementById('hoursInput').value = clickedHours.toFixed(1);
|
|
|
|
|
|
document.getElementById('predictBtn').click();
|
|
|
});
|
|
|
</script>
|
|
|
{% endblock %} |