Spaces:

shreyask
/

qmd-web

Running

App Files Files Community

qmd-web / eval-docs /machine-learning-primer.md

shreyask

fix: add eval-docs to root for HF static serving

6534024 verified 1 day ago

preview code

raw

history blame contribute delete

3.39 kB

Machine Learning: A Beginner's Guide

What is Machine Learning?

Machine learning is a subset of artificial intelligence where systems learn patterns from data rather than being explicitly programmed. Instead of writing rules, you provide examples and let the algorithm discover the rules.

Types of Machine Learning

Supervised Learning

The algorithm learns from labeled examples.

Classification: Predicting categories

Email spam detection
Image recognition
Medical diagnosis

Regression: Predicting continuous values

House price prediction
Stock price forecasting
Temperature prediction

Common algorithms:

Linear Regression
Logistic Regression
Decision Trees
Random Forests
Support Vector Machines (SVM)
Neural Networks

Unsupervised Learning

The algorithm finds patterns in unlabeled data.

Clustering: Grouping similar items

Customer segmentation
Document categorization
Anomaly detection

Dimensionality Reduction: Simplifying data

Feature extraction
Visualization
Noise reduction

Common algorithms:

K-Means Clustering
Hierarchical Clustering
Principal Component Analysis (PCA)
t-SNE

Reinforcement Learning

The algorithm learns through trial and error, receiving rewards or penalties.

Applications:

Game playing (AlphaGo, chess)
Robotics
Autonomous vehicles
Resource management

The Machine Learning Pipeline

Data Collection: Gather relevant data
Data Cleaning: Handle missing values, outliers
Feature Engineering: Create useful features
Model Selection: Choose appropriate algorithm
Training: Fit model to training data
Evaluation: Test on held-out data
Deployment: Put model into production
Monitoring: Track performance over time

Key Concepts

Overfitting vs Underfitting

Overfitting: Model memorizes training data, performs poorly on new data

Solution: More data, regularization, simpler model

Underfitting: Model too simple to capture patterns

Solution: More features, complex model, less regularization

Train/Test Split

Never evaluate on training data. Common splits:

80% training, 20% testing
70% training, 15% validation, 15% testing

Cross-Validation

K-fold cross-validation provides more robust evaluation:

Split data into K folds
Train on K-1 folds, test on remaining fold
Repeat K times
Average the results

Bias-Variance Tradeoff

High Bias: Oversimplified model (underfitting)
High Variance: Overcomplicated model (overfitting)
Goal: Find the sweet spot

Evaluation Metrics

Classification

Accuracy: Correct predictions / Total predictions
Precision: True positives / Predicted positives
Recall: True positives / Actual positives
F1 Score: Harmonic mean of precision and recall
AUC-ROC: Area under receiver operating curve

Regression

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R-squared (R2)

Getting Started

Learn Python and libraries (NumPy, Pandas, Scikit-learn)
Work through classic datasets (Iris, MNIST, Titanic)
Take online courses (Coursera, fast.ai)
Practice on Kaggle competitions
Build projects with real-world data

Remember: Machine learning is 80% data preparation and 20% modeling. Start with clean data and simple models before going complex.