Supervised machine learning

26/05/2025

Supervised Machine Learning Techniques are algorithms used to learn a mapping from input data (features) to a target variable (labels), based on labeled training data. These techniques are widely used for classification and regression problems.

Here is a detailed overview of the most common supervised learning techniques:

1. Linear Regression

  • Type: Regression

  • Use Case: Predicting continuous values (e.g., housing prices).

  • How it works: Assumes a linear relationship between input variables (X) and the output (y).

  • Formula:

    y=β0+β1x1+β2x2+⋯+βnxn

2. Logistic Regression

  • Type: Classification

  • Use Case: Binary or multi-class classification (e.g., spam detection).

  • How it works: Models the probability that a given input belongs to a particular class using the logistic (sigmoid) function.

3. Decision Trees

  • Type: Classification/Regression

  • Use Case: Fraud detection, credit scoring, etc.

  • How it works: Splits the data into subsets based on feature values using conditions (like yes/no questions), forming a tree structure.

4. Random Forest

  • Type: Classification/Regression

  • Use Case: High-accuracy tasks with large datasets.

  • How it works: Ensemble of multiple decision trees; each tree is trained on a random subset of data and features. Final prediction is made by majority vote or averaging.

5. Support Vector Machines (SVM)

  • Type: Classification/Regression

  • Use Case: Image classification, bioinformatics.

  • How it works: Finds the optimal hyperplane that separates classes with maximum margin. Can handle non-linear data using kernel tricks.

6. K-Nearest Neighbors (K-NN)

  • Type: Classification/Regression

  • Use Case: Pattern recognition, recommendation systems.

  • How it works: Stores all training data; a new input is classified based on the majority class among its k-nearest neighbors.

7. Naïve Bayes

  • Type: Classification

  • Use Case: Text classification, spam filtering.

  • How it works: Applies Bayes' Theorem with strong (naïve) independence assumptions between features.

8. Gradient Boosting Machines (GBM) / XGBoost / LightGBM / CatBoost

  • Type: Classification/Regression

  • Use Case: Kaggle competitions, production-level machine learning.

  • How it works: Sequentially builds models that correct the errors of previous models. Uses decision trees as base learners.

9. Neural Networks (Shallow and Deep)

  • Type: Classification/Regression

  • Use Case: Complex problems like image recognition, natural language processing.

  • How it works: Composed of layers of neurons that transform input data through learned weights and activation functions.