{% extends "layout.html" %} {% block content %} Study Guide: Linear Discriminant Analysis (LDA)

🔍 Study Guide: Linear Discriminant Analysis (LDA)

Tap Me!

🔹 Core Concepts

Story-style intuition: The Smart Photographer

Imagine you have to take a single photo of two different groups of people, say a basketball team (tall, lean) and a group of sumo wrestlers (shorter, heavy). A regular photographer (like PCA) doesn't know who is in which group, so they might take the photo from an angle that just shows the biggest spread of people, perhaps from the side. But you are a smart photographer (using LDA). You already have the guest list and know who is a basketball player and who is a sumo wrestler. So, you find the one perfect camera angle that makes the two groups look as distinct as possible. This angle will likely be one that contrasts height against weight, making the two groups form separate, tight clusters in your photo. LDA is a supervised technique that uses these known labels to find the best "camera angles" (projections) to maximize the separation between groups.

Linear Discriminant Analysis (LDA) is a powerful technique used for both supervised classification and dimensionality reduction. Its primary goal is to find a new, lower-dimensional space to project the data onto, such that the separation (or discrimination) between the different classes is maximized. The new axes it finds are called linear discriminants.

🔹 Intuition Behind LDA

While PCA is unsupervised and only cares about finding axes that maximize the total variance (the spread of the entire dataset), LDA is supervised and has a much more specific goal. It uses the class labels to find a projection that simultaneously accomplishes two things:

This image illustrates the core idea. Projecting onto the horizontal axis (like PCA might) causes the classes to overlap. LDA finds a new, tilted axis that perfectly separates the centers of the blue and red clusters while keeping each cluster's projection tight.

🔹 Mathematical Foundation

To achieve its goals, LDA mathematically defines the two objectives and finds a projection that optimizes them. It calculates two key statistical measures:

  1. Within-Class Scatter Matrix ($$S_W$$): A number that represents the total scatter of data points around their respective class centers. Think of this as the "compactness" of all the individual groups added together. LDA wants this to be as small as possible.
  2. Between-Class Scatter Matrix ($$S_B$$): A number representing the scatter of the class centers around the overall dataset's center. Think of this as how "spread out" the groups are from one another. LDA wants this to be as large as possible.

The perfect "camera angle" (projection matrix W) is the one that maximizes the ratio of $$S_B$$ to $$S_W$$. This is a classic optimization problem that is solved using a technique called the generalized eigenvalue problem.

🔹 Geometric Interpretation

Geometrically, LDA rotates and projects the data to find the best view for class separation. The number of new dimensions (linear discriminants) it can create is limited by the number of classes. Specifically, for a problem with **k** classes, LDA can find at most **k-1** new axes.

Example:

This makes LDA an excellent tool for visualizing the separability of multi-class datasets.

🔹 Assumptions of LDA

LDA is a powerful tool, but it relies on a few key assumptions about the data. The model performs best when these are met:

🔹 Comparison with PCA

Feature LDA (Linear Discriminant Analysis) PCA (Principal Component Analysis)
Supervision Supervised (it requires class labels to compute class separability). Unsupervised (it only looks at the data's features, not the labels).
Goal To find a projection that maximizes class separability. To find a projection that maximizes total variance.
Application Primarily used for classification or as a preprocessing step for classification. Primarily used for general data representation, visualization, and compression.
Example Visualization

🔹 Strengths & Weaknesses

Advantages:

Disadvantages:

🔹 When to Use LDA

🔹 Python Implementation (Beginner Example with Iris Dataset)

Here, we use the Iris dataset, which has 3 classes of flowers and 4 features. Since there are 3 classes, LDA can reduce the data to a maximum of 2 components (3-1=2). We will use it first for dimensionality reduction and visualization, and then show how it can be used directly as a classifier.


import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# --- 1. Load and Scale the Data ---
iris = load_iris()
X, y = iris.data, iris.target

# Split data for later classification test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scaling is a good practice for LDA.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# --- PART A: LDA for Dimensionality Reduction ---

# --- 2. Create and Apply LDA ---
# Since there are 3 classes, we can reduce to at most 2 components.
lda_dr = LinearDiscriminantAnalysis(n_components=2)

# Fit LDA and transform the training data. Note: .fit() needs both X and y.
X_train_lda = lda_dr.fit_transform(X_train_scaled, y_train)

# --- 3. Visualize the Results ---
plt.figure(figsize=(8, 6))
plt.scatter(X_train_lda[:, 0], X_train_lda[:, 1], c=y_train, cmap='viridis', edgecolor='k')
plt.title('LDA of Iris Dataset (4D -> 2D)')
plt.xlabel('Linear Discriminant 1')
plt.ylabel('Linear Discriminant 2')
plt.grid(True)
plt.show()


# --- PART B: LDA as a Classifier ---

# --- 4. Train LDA as a Classifier ---
# We don't set n_components, so it uses the components for classification.
lda_clf = LinearDiscriminantAnalysis()
lda_clf.fit(X_train_scaled, y_train)

# --- 5. Make Predictions and Evaluate ---
y_pred = lda_clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of LDA as a classifier: {accuracy:.2%}")

        

🔹 Best Practices

🔹 Key Terminology Explained (LDA)

The Story: Decoding the Smart Photographer's Toolkit

{% endblock %}