Neroml / Templates /supervised.html
deedrop1140's picture
Upload 69 files
6491927 verified
{% extends "layout.html" %}
{% block content %}
<script src="https://cdn.tailwindcss.com"></script>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
</script>
<h1 class="text-3xl font-bold text-center text-gray-800 mb-4">Linear Regression Explained</h1>
<p class="text-gray-600 text-center mb-6">
Understand the fundamentals of Linear Regression, its computational flow, and how it makes predictions.
</p>
<div class="flex flex-col md:flex-row gap-8">
<div class="flex-1">
<div class="explanation-box">
<h3 class="text-xl font-semibold text-gray-700 mb-3">What is Linear Regression?</h3>
<p class="text-gray-600 mb-4">
Linear Regression is a fundamental supervised learning algorithm used for predicting a continuous outcome variable (dependent variable) based on one or more input features (independent variables). It models the relationship between the variables by fitting a linear equation to the observed data.
</p>
<p class="text-gray-600 mb-4">
For a simple linear regression with one input feature, the equation used by our model is:
<br> Predicted Score = (20 x Hours Studied) + 15
</p>
<ul class="list-disc list-inside text-gray-600 mb-4">
<li>Slope (m=20): Represents how much the predicted outcome changes for every one-unit increase in the input feature. It indicates the strength and direction of the relationship.</li>
<li>Intercept (b=15): Represents the predicted outcome when all input features are zero. It's the baseline value.</li>
</ul>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Why Slope (m) is 20</h3>
<p class="text-gray-600 mb-4">
The slope of 20 means each hour of studying contributes 20 points to your exam score. For example, if you study one more hour, your predicted score increases by 20 points.
</p>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Why Intercept (b) is 15</h3>
<p class="text-gray-600 mb-4">
The intercept of 15 represents points earned regardless of study time. This could account for:
</p>
<ul class="list-disc list-inside text-gray-600 mb-4">
<li>Class attendance and participation</li>
<li>Homework assignments</li>
<li>Quizzes and in-class activities</li>
<li>Base marks for attempting the exam</li>
</ul>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Computational Flow (Input to Output):</h3>
<p class="text-gray-600 mb-4">
The following steps illustrate how our Linear Regression model makes predictions:
</p>
<ol class="list-decimal list-inside text-gray-600 mb-4">
<li><strong>Input Data:</strong> You (the user) provide a value for 'Hours Studied'.</li>
<li><strong>Load Model:</strong> The Flask application loads the pre-trained `supervised_model.pkl`. This model contains the learned parameters: a slope (m) of 20 and an intercept (b) of 15.</li>
<li><strong>Calculate:</strong> The model computes the predicted score using its linear equation:
<p class="font-mono text-sm text-gray-700 my-2 pl-4">
<code>Predicted Score = (20 * Input Hours) + 15</code>
</p>
This is a simple multiplication and addition operation.
</li>
<li><strong>Result:</strong> The calculated 'Predicted Score' is returned by the model.</li>
<li><strong>Display:</strong> The Flask application then renders this predicted score on the web page for you to see.</li>
</ol>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Our Training Data:</h3>
<p class="text-gray-600 mb-2">
The model was trained on the following data points to learn the relationship between 'Hours Studied' and 'Score' using the equation `Score = 20 * Hours + 15`:
</p>
<div class="overflow-x-auto mb-4">
<table class="min-w-full bg-white rounded-lg shadow-md overflow-hidden text-gray-700">
<thead>
<tr class="bg-gray-100 border-b border-gray-200">
<th class="py-3 px-4 text-left font-semibold">Hours Studied X</th>
<th class="py-3 px-4 text-left font-semibold">Score Y</th>
</tr>
</thead>
<tbody>
<tr class="border-b border-gray-100">
<td class="py-3 px-4">1</td>
<td class="py-3 px-4">35</td>
</tr>
<tr class="border-b border-gray-100">
<td class="py-3 px-4">2</td>
<td class="py-3 px-4">55</td>
</tr>
<tr class="border-b border-gray-100">
<td class="py-3 px-4">3</td>
<td class="py-3 px-4">75</td>
</tr>
<tr class="border-b border-gray-100">
<td class="py-3 px-4">4</td>
<td class="py-3 px-4">95</td>
</tr>
<tr>
<td class="py-3 px-4">5</td>
<td class="py-3 px-4">115</td>
</tr>
</tbody>
</table>
</div>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Cost Function Quantifying Error</h3>
<p class="text-gray-600 mb-4">
When a linear regression model is being trained, it doesn't just randomly draw a line. It evaluates how good its current line is by using a Cost Function. The goal of training is to find the line i.e. the specific m and b values that minimizes this cost.
</p>
<p class="text-gray-600 mb-4">
A common cost function for linear regression is the Mean Squared Error MSE. It calculates the average of the squared differences between the actual observed values $y_i$ and the values predicted by the model $\hat{y_i}$.
</p>
<p class="text-center my-4 text-lg font-semibold text-gray-700">
Mean Squared Error (MSE) Formula:
</p>
<p class="text-center my-2 text-base">
\[
\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
\]
</p>
<ul class="list-disc list-inside text-gray-600 mb-4">
<li><strong>N</strong>: The total number of data points.</li>
<li>\( y_i \): The actual score for data point <em>i</em>.</li>
<li>\( \hat{y}_i \): The predicted score for data point <em>i</em>, calculated as \( m \times x_i + b \).</li>
</ul>
<p class="text-gray-600 mb-4">
Squaring the differences ensures that all errors are positive and penalizes larger errors more heavily. The model continuously adjusts its m and b to make this MSE value as small as possible.
</p>
<h3 class="text-xl font-semibold text-gray-700 mb-3">Gradient Descent Learning the Best Line</h3>
<p class="text-gray-600 mb-4">
Gradient Descent is an optimization algorithm used by linear regression and many other machine learning models to find the values of m and b that minimize the cost function like MSE. Imagine the cost function as a landscape with hills and valleys, and the goal is to find the lowest point (the minimum cost).
</p>
<ol class="list-decimal list-inside text-gray-600 mb-4">
<li><strong>Start Randomly:</strong> The algorithm starts with some initial, often random, values for m and b.</li>
<li><strong>Calculate Gradient:</strong> It calculates the gradient of the cost function with respect to m and b. The gradient is like a vector that points in the direction of the steepest ascent on the cost landscape.</li>
<li><strong>Take a Step:</strong> To minimize the cost, the algorithm takes a small step in the opposite direction of the gradient (downhill). The size of this step is controlled by a parameter called the learning rate.</li>
<li><strong>Repeat:</strong> Steps 2 and 3 are repeated iteratively, with m and b being updated in each iteration. With each step, the model gets closer to the optimal m and b values that minimize the cost.</li>
<li><strong>Convergence:</strong> This process continues until the algorithm converges, meaning the cost function stops decreasing significantly, indicating it has found the minimum or a very good approximation of it.</li>
</ol>
<p class="text-gray-600 mb-4">
So, when `model.fit(X, y)` is called, behind the scenes, an optimization algorithm like Gradient Descent is tirelessly working to find the m and b that best fit your data by minimizing the prediction errors.
</p>
</div>
</div>
<div class="flex-1 flex flex-col gap-6">
<div class="bg-white rounded-lg shadow-md p-6">
<h3 class="text-xl font-semibold text-gray-700 mb-4">Visualizing the Regression Line</h3>
<canvas id="regressionCanvas" width="400" height="300" class="border border-gray-300 rounded-md"></canvas>
<p class="text-sm text-gray-600 mt-2">
Slope (m): <span id="slopeValue"></span>, Intercept (b): <span id="interceptValue"></span>
</p>
</div>
<div class="p-6 bg-gray-50 rounded-xl shadow-inner">
<h3 class="text-xl font-semibold text-gray-700 mb-4">Make a Prediction:</h3>
<form method="POST" class="flex flex-col sm:flex-row items-center gap-4">
<label for="hoursInput" class="text-gray-700 font-medium">Hours Studied:</label>
<input type="number" id="hoursInput" name="hours" min="0" step="0.1"
value="{{ hours_studied_input if hours_studied_input is not none else '3.5' }}"
required class="flex-grow"
style="border: 1px solid #d1d5db; border-radius: 0.5rem; padding: 0.75rem 1rem; font-size: 1rem; width: 100%; max-width: 200px; transition: border-color 0.2s;">
<button type="submit" id="predictBtn"
style="background-color: #3b82f6; color: white; padding: 0.75rem 1.5rem; border-radius: 0.5rem; font-weight: 600; transition: background-color 0.2s, transform 0.1s; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);">
Predict Score
</button>
</form>
</div>
<div id="predictionOutput" class="prediction-box {% if prediction is none %}hidden{% endif %}"
style="background-color: #e0f2fe; border: 1px solid #93c5fd; border-radius: 0.75rem; padding: 1.5rem; text-align: center;">
<h3 class="text-2xl font-bold text-blue-700 mb-2">Predicted Score:</h3>
<p class="text-4xl font-extrabold text-blue-900" id="predictedScore">
{% if prediction is not none %}
{{ prediction | round(2) }}
{% else %}
--.--
{% endif %}
</p>
<p class="text-sm text-gray-600 mt-2">
This is the score predicted by the linear regression model for the hours you entered.
</p>
</div>
</div>
</div>
<script>
// Get canvas and context
const canvas = document.getElementById('regressionCanvas');
const ctx = canvas.getContext('2d');
// Data from your Python script (X, y)
// Updated to match y = 20x + 15 model
const X_data = [1, 2, 3, 4, 5];
const y_data = [35, 55, 75, 95, 115];
// --- Understanding Slope (m) and Intercept (b) ---
// These values are now hardcoded to match the model in your screenshots (m=20, b=15)
const slope = 20;
const intercept = 15;
// Display slope and intercept values in the HTML
document.getElementById('slopeValue').textContent = slope.toFixed(2);
document.getElementById('interceptValue').textContent = intercept.toFixed(2);
// Canvas dimensions and padding
let canvasWidth, canvasHeight;
const padding = 50;
// Scale factors for drawing data onto the canvas
let xScale, yScale;
let xMin, xMax, yMin, yMax;
// Prediction variables (these will be updated when the user inputs hours)
let predictedHours = null;
let predictedScore = null;
// Function to set up scaling based on data range and canvas size
function setupScaling() {
canvasWidth = canvas.width;
canvasHeight = canvas.height;
// Determine data ranges for X and Y axes
xMin = Math.min(...X_data, 0); // Always start X-axis at 0
// Set xMax to at least 10 (as per the last request) and ensure it covers any new predicted hours
xMax = Math.max(...X_data, predictedHours !== null ? predictedHours : 0, 10) + 1; // Extend x-axis slightly beyond 10
yMin = Math.min(...y_data, 0); // Always start Y-axis at 0
// Calculate the predicted score for the determined xMax to ensure the y-axis covers the line
const maxPredictedY = slope * xMax + intercept;
yMax = Math.max(...y_data, predictedScore !== null ? predictedScore : 0, maxPredictedY) + 20; // Extend y-axis slightly beyond max needed
// Calculate scaling factors to fit data within the canvas padding
xScale = (canvasWidth - 2 * padding) / (xMax - xMin);
yScale = (canvasHeight - 2 * padding) / (yMax - yMin);
}
// Convert data coordinates (e.g., hours, score) to canvas pixel coordinates
function toCanvasX(x) {
return padding + (x - xMin) * xScale;
}
function toCanvasY(y) {
return canvasHeight - padding - (y - yMin) * yScale;
}
// Function to draw the entire graph, including data points, regression line, and predictions
function drawGraph() {
ctx.clearRect(0, 0, canvasWidth, canvasHeight); // Clear the entire canvas
// Draw axes
ctx.beginPath();
ctx.strokeStyle = '#64748b'; // Slate gray for axes
ctx.lineWidth = 2;
// X-axis (horizontal line)
ctx.moveTo(padding, toCanvasY(yMin));
ctx.lineTo(canvasWidth - padding, toCanvasY(yMin));
// Y-axis (vertical line)
ctx.moveTo(toCanvasX(xMin), padding);
ctx.lineTo(toCanvasX(xMin), canvasHeight - padding);
ctx.stroke();
// Draw axis labels and ticks
ctx.fillStyle = '#475569'; // Darker gray for labels
ctx.font = '14px Inter';
ctx.textAlign = 'center';
ctx.textBaseline = 'top';
// X-axis labels (Hours Studied)
// Dynamic tick step for clarity on different scales
const xTickStep = 1; // Every 1 hour for a graph up to 10
for (let i = Math.ceil(xMin / xTickStep) * xTickStep; i <= Math.floor(xMax); i += xTickStep) {
if (i >= 0) {
ctx.fillText(i + 'h', toCanvasX(i), canvasHeight - padding + 10);
ctx.beginPath();
ctx.moveTo(toCanvasX(i), canvasHeight - padding);
ctx.lineTo(toCanvasX(i), canvasHeight - padding - 5);
ctx.stroke();
}
}
// X-axis title
ctx.fillText('Hours Studied', canvasWidth / 2, canvasHeight - 20);
ctx.textAlign = 'right';
ctx.textBaseline = 'middle';
// Y-axis labels (Score)
// Dynamic tick step for clarity on different scales
const yTickStep = (yMax - yMin) / 10 > 20 ? 50 : 20; // Example: every 20 or 50 points
for (let i = Math.ceil(yMin / yTickStep) * yTickStep; i <= Math.floor(yMax); i += yTickStep) {
if (i >= 0) {
ctx.fillText(i.toFixed(0), padding - 10, toCanvasY(i));
ctx.beginPath();
ctx.moveTo(padding, toCanvasY(i));
ctx.lineTo(padding + 5, toCanvasY(i));
ctx.stroke();
}
}
// Y-axis title (rotated)
ctx.save();
ctx.translate(20, canvasHeight / 2);
ctx.rotate(-Math.PI / 2);
ctx.textAlign = 'center';
ctx.fillText('Score', 0, 0);
ctx.restore();
// Draw data points (blue circles)
ctx.fillStyle = '#3b82f6'; // Blue for data points
X_data.forEach((x, i) => {
ctx.beginPath();
ctx.arc(toCanvasX(x), toCanvasY(y_data [i]), 5, 0, Math.PI * 2); // Radius 5
ctx.fill();
});
// Draw regression line (red line)
ctx.beginPath();
ctx.strokeStyle = '#ef4444'; // Red for regression line
ctx.lineWidth = 3;
// Draw line across the entire X-axis range based on the model equation
ctx.moveTo(toCanvasX(xMin), toCanvasY(slope * xMin + intercept));
ctx.lineTo(toCanvasX(xMax), toCanvasY(slope * xMax + intercept));
ctx.stroke();
// Draw predicted point and lines if available (green point and dashed lines)
if (predictedHours !== null && predictedScore !== null) {
const predX = toCanvasX(predictedHours);
const predY = toCanvasY(predictedScore);
// Predicted point
ctx.fillStyle = '#22c55e'; // Green for predicted point
ctx.beginPath();
ctx.arc(predX, predY, 6, 0, Math.PI * 2); // Slightly larger radius
ctx.fill();
// Dotted lines to axes
ctx.strokeStyle = '#22c55e'; // Green for dotted lines
ctx.lineWidth = 1.5;
ctx.setLineDash([5, 5]); // Dotted line style
// Line from predicted point to X-axis
ctx.beginPath();
ctx.moveTo(predX, predY);
ctx.lineTo(predX, toCanvasY(yMin));
ctx.stroke();
// Line from predicted point to Y-axis
ctx.beginPath();
ctx.moveTo(predX, predY);
ctx.lineTo(toCanvasX(xMin), predY);
ctx.stroke();
ctx.setLineDash([]); // Reset line dash to solid for subsequent drawings
}
}
// Event listener for the "Predict Score" button click
document.getElementById('predictBtn').addEventListener('click', () => {
// Get the value from the input field and parse it as a floating-point number
const hoursInput = parseFloat(document.getElementById('hoursInput').value);
// Check if the input is a valid number
if (!isNaN(hoursInput)) {
// Update global prediction variables
predictedHours = hoursInput;
predictedScore = slope * predictedHours + intercept;
// Display the predicted score in the HTML
document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);
// Make the prediction output box visible
document.getElementById('predictionOutput').classList.remove('hidden');
// Recalculate scaling and redraw the graph to accommodate new prediction if it extends axes
setupScaling();
drawGraph();
} else {
// If input is invalid, display an error message
const outputDiv = document.getElementById('predictionOutput');
outputDiv.innerHTML = '<p class="text-red-600">Please enter a valid number for hours studied.</p>';
outputDiv.classList.remove('hidden');
}
});
// Function to handle canvas resizing and redraw the graph
function resizeCanvas() {
// Get the device pixel ratio for sharper rendering on high-DPI screens
const dpi = window.devicePixelRatio;
// Get the actual rendered size of the canvas element from its CSS styles
const rect = canvas.getBoundingClientRect();
// Set the internal drawing buffer size of the canvas
canvas.width = rect.width * dpi;
canvas.height = rect.height * dpi;
// Scale the drawing context to match the DPI, ensuring crisp lines and text
ctx.scale(dpi, dpi);
// Re-setup scaling for data to canvas coordinates and redraw
setupScaling();
drawGraph();
}
// Initial setup and draw when the window loads
window.addEventListener('load', () => {
resizeCanvas(); // Set initial canvas size and draw
// Also trigger an initial prediction for the default value in the input field
const initialHours = parseFloat(document.getElementById('hoursInput').value);
if (!isNaN(initialHours)) {
predictedHours = initialHours;
predictedScore = slope * initialHours + intercept;
document.getElementById('predictedScore').textContent = predictedScore.toFixed(2);
document.getElementById('predictionOutput').classList.remove('hidden');
setupScaling();
drawGraph();
}
});
// Redraw the graph whenever the window is resized
window.addEventListener('resize', resizeCanvas);
// Optional: Allow clicking on canvas to set hours input (for quick testing)
canvas.addEventListener('click', (event) => {
// Get mouse click coordinates relative to the canvas
const rect = canvas.getBoundingClientRect();
const mouseX = (event.clientX - rect.left) / (canvas.width / canvas.getBoundingClientRect().width);
const mouseY = (event.clientY - rect.top) / (canvas.height / canvas.getBoundingClientRect().height);
// Convert canvas X coordinate back to data X (hours studied)
const clickedHours = xMin + (mouseX - padding) / xScale;
// Update the input field with the clicked hours
document.getElementById('hoursInput').value = clickedHours.toFixed(1);
// Trigger the prediction immediately
document.getElementById('predictBtn').click();
});
</script>
{% endblock %}