Building Decision Trees

17/05/2026

Understanding Decision Trees

Decision trees are intuitive models used to support decisions, classify data, or predict outcomes by following a series of simple rules. Each internal node represents a question, each branch represents a possible answer, and each leaf node represents a final decision or prediction. Because the logic is visual and easy to follow, decision trees are widely used in business, data science, and operations to explain complex choices in a transparent way.

They can handle both numerical and categorical data, work well as a first modeling approach, and help teams align on how decisions are actually made. Whether you are mapping customer journeys, evaluating risks, or building machine learning models, a clear decision tree can turn scattered information into a structured, repeatable process.

To build a decision tree, you start with a main question or goal, then repeatedly split your data or options based on the most informative criteria. At each step, you choose the question that best separates different outcomes, gradually forming a branching structure. This makes it easy to trace why a particular decision was reached, which is especially valuable in regulated or high‑stakes environments.

However, decision trees can become overly complex if not pruned or simplified, leading to overfitting in predictive models. Good practice includes limiting depth, combining similar branches, and regularly reviewing the tree with stakeholders. When designed thoughtfully, decision trees provide a powerful balance of clarity, flexibility, and analytical strength.

Decision Tree in Machine Learning

🌳 Decision Tree in Machine Learning

A Decision Tree is a supervised learning algorithm used for classification and regression.

  • Internal Node β†’ Decision based on feature
  • Branch β†’ Outcome of decision
  • Leaf Node β†’ Final prediction

πŸ“Š Example Dataset

Age Income Student Credit Rating Buys Product
<30HighNoFairNo
<30HighNoExcellentNo
31–40HighNoFairYes
>40MediumNoFairYes
>40LowYesFairYes
>40LowYesExcellentNo
31–40LowYesExcellentYes
<30MediumNoFairNo
<30LowYesFairYes
>40MediumYesFairYes
<30MediumYesExcellentYes
31–40MediumNoExcellentYes
31–40HighYesFairYes
>40MediumNoExcellentNo

🧠 Step-by-Step Working

1. Root Node Selection

  • Entropy & Information Gain
  • Gini Index
Assumption: Age is selected as the root node.

2. Splitting

  • Age < 30
  • Age 31–40
  • Age > 40

3. Further Decisions

Case 1: Age < 30

  • Student = Yes β†’ Yes
  • Student = No β†’ No

Case 2: Age 31–40

  • All outcomes β†’ Yes

Case 3: Age > 40

  • Credit Rating = Fair β†’ Yes
  • Credit Rating = Excellent β†’ No

🌿 Final Decision Tree

Age
β”œβ”€β”€ <30 β†’ Student
β”‚   β”œβ”€β”€ Yes β†’ Yes
β”‚   └── No β†’ No
β”œβ”€β”€ 31–40 β†’ Yes
└── >40 β†’ Credit Rating
    β”œβ”€β”€ Fair β†’ Yes
    └── Excellent β†’ No

πŸ” Example Prediction

Input: Age = <30, Student = Yes
Prediction: Yes

πŸ“Œ Key Concepts

Entropy = - Ξ£ (p logβ‚‚ p)
Information Gain = Entropy(parent) - Weighted Entropy(children)
Gini = 1 - Ξ£ (pΒ²)

⚑ Advantages

  • Easy to understand
  • Handles categorical & numerical data
  • No feature scaling needed

⚠️ Disadvantages

  • Overfitting risk
  • Sensitive to data changes

πŸš€ Applications

  • Credit risk analysis
  • Medical diagnosis
  • Fraud detection
  • Customer segmentation
Gini Impurity - Decision Tree Root Node Selection

🌳 Root Node Selection using Gini Impurity

πŸ“Œ Step 1: Gini Formula

Gini = 1 - Ξ£ (pΒ²)

Where p represents probability of each class.

---

πŸ“Š Step 2: Gini of Entire Dataset

  • Total Records = 14
  • Yes = 9
  • No = 5
p(Yes) = 9/14
p(No) = 5/14

Gini = 1 - (9/14)Β² - (5/14)Β²
     = 1 - (0.41 + 0.13)
     = 0.46
---

πŸ” Step 3: Split by Features

βœ… Feature 1: Age

Age < 30 β†’ Yes=2, No=3

Gini = 0.48

Age 31–40 β†’ Yes=4, No=0

Gini = 0

Age > 40 β†’ Yes=3, No=2

Gini = 0.48

🎯 Weighted Gini (Age)

= (5/14)*0.48 + (4/14)*0 + (5/14)*0.48
= 0.342
---

βœ… Feature 2: Student

Student = Yes β†’ Yes=6, No=1

Gini = 0.245

Student = No β†’ Yes=3, No=4

Gini = 0.49

🎯 Weighted Gini (Student)

= (7/14)*0.245 + (7/14)*0.49
= 0.367
---

βœ… Feature 3: Credit Rating

Fair β†’ Yes=6, No=2

Gini β‰ˆ 0.375

Excellent β†’ Yes=3, No=3

Gini = 0.5

🎯 Weighted Gini (Credit Rating)

= (8/14)*0.375 + (6/14)*0.5
= 0.428
---

πŸ† Step 4: Comparison

Feature Weighted Gini
Age 0.342 βœ…
Student 0.367
Credit Rating 0.428
---

🎯 Final Decision

Root Node = Age (Lowest Gini Impurity)
---

🧠 Key Insight

  • Lower Gini = Better split
  • Gini = 0 β†’ Pure node
  • Decision Trees minimize impurity at each step
The algorithm selects the feature that makes the data most pure after splitting.
Root Node Selection using Entropy & Information Gain

🌳 Root Node Selection using Entropy & Information Gain

πŸ“Œ Step 1: Entropy Formula

Entropy = - Ξ£ (p logβ‚‚ p)

Where p represents probability of each class.


πŸ“Š Step 2: Entropy of Entire Dataset

  • Total Records = 14
  • Yes = 9
  • No = 5
p(Yes) = 9/14
p(No) = 5/14

Entropy(S) = -[(9/14) logβ‚‚ (9/14) + (5/14) logβ‚‚ (5/14)]
           β‰ˆ 0.94

πŸ” Step 3: Information Gain Calculation

βœ… Feature 1: Age

Age Group Yes No Entropy
<30230.97
31–40400
>40320.97

🎯 Weighted Entropy (Age)

= (5/14)*0.97 + (4/14)*0 + (5/14)*0.97
= 0.693

πŸ“ˆ Information Gain (Age)

IG(Age) = 0.94 - 0.693 = 0.247

βœ… Feature 2: Student

Student Yes No Entropy
Yes610.59
No340.98

🎯 Weighted Entropy (Student)

= (7/14)*0.59 + (7/14)*0.98
= 0.785

πŸ“ˆ Information Gain (Student)

IG(Student) = 0.94 - 0.785 = 0.155

βœ… Feature 3: Credit Rating

Credit Rating Yes No Entropy
Fair620.81
Excellent331.00

🎯 Weighted Entropy (Credit Rating)

= (8/14)*0.81 + (6/14)*1.00
= 0.892

πŸ“ˆ Information Gain (Credit Rating)

IG(Credit) = 0.94 - 0.892 = 0.048

πŸ† Step 4: Comparison

Feature Information Gain
Age 0.247 βœ…
Student 0.155
Credit Rating 0.048

🎯 Final Decision

Root Node = Age (Highest Information Gain)

🧠 Key Insights

  • Higher Information Gain = Better split
  • Entropy measures uncertainty
  • Information Gain measures reduction in uncertainty
The algorithm selects the feature that reduces uncertainty the most after splitting.

⚑ Final Conclusion

Method Selected Root
Gini Index Age βœ…
Information Gain Age βœ…

πŸ” Cybersecurity Use Case: Intrusion Detection using Decision Trees

In cybersecurity, decision trees are widely used for threat detection, fraud analysis, and intrusion detection systems (IDS).

πŸ“Š Example Dataset (Network Activity)

Login Attempts IP Reputation Time of Access Data Transfer Threat
HighUnknownNightHighYes
LowTrustedDayLowNo
MediumUnknownNightMediumYes
HighBlacklistedNightHighYes
LowTrustedDayLowNo
MediumTrustedEveningMediumNo
HighUnknownNightHighYes
LowUnknownDayLowNo

🧠 Objective

Predict whether a network activity is a Threat (Yes/No).

---

πŸ” Example Decision Logic

  • If IP Reputation = Blacklisted β†’ Threat = Yes
  • If Login Attempts = High AND Time = Night β†’ Threat = Yes
  • If IP Reputation = Trusted β†’ Threat = No
---

🌿 Sample Decision Tree

IP Reputation
β”œβ”€β”€ Blacklisted β†’ Threat = Yes
β”œβ”€β”€ Trusted β†’ Threat = No
└── Unknown
    β”œβ”€β”€ Login Attempts = High β†’ Yes
    └── Login Attempts = Low/Medium β†’ No
---

πŸ“ˆ Why This Works in Cybersecurity

  • Identifies suspicious patterns quickly
  • Interpretable rules (important for audits & compliance)
  • Works well with categorical + behavioral data
  • Can be integrated into SIEM systems
---

⚠️ Real-World Considerations

  • Attackers may mimic normal behavior (evasion)
  • Requires continuous retraining with new threat data
  • Often combined with ensemble methods (Random Forest, XGBoost)
---

πŸš€ Advanced Insight (Expert Level)

In real-world cybersecurity systems, decision trees are rarely used alone. They are part of ensemble models and AI-driven SOC pipelines, where they help explain decisions made by complex models.
Cybersecurity Threat Detection Simulator

πŸ” Cybersecurity Threat Detection Simulator

Select the network parameters and click Predict to detect potential threats.

Decision Logic:
  • If IP Reputation = Blacklisted β†’ Threat
  • If Login Attempts = High AND Time = Night β†’ Threat
  • If IP Reputation = Trusted β†’ Safe
  • Otherwise β†’ Safe
Regression Tree in Machine Learning

🌳 Regression Tree in Machine Learning

A Regression Tree is used when the output variable is continuous (numerical) instead of categorical.

---

🎯 Problem: Predict House Price

πŸ“Š Dataset

Size (sq ft) Bedrooms Age (years) Price (β‚Ή Lakhs)
80021040
9002845
10003650
12003560
15004475
17004385
200052100
---

🧠 How Regression Tree Works

  • Splits data based on feature values
  • Predicts mean value at leaf nodes
  • Minimizes variance / Mean Squared Error (MSE)
---

πŸ“Œ Step 1: First Split

Condition: Size < 1200

Left Node (Size < 1200)

Prices: 40, 45, 50

Mean = 45

Right Node (Size β‰₯ 1200)

Prices: 60, 75, 85, 100

Mean = 80

---

πŸ“Œ Step 2: Further Split

Condition: Bedrooms < 4

Bedrooms < 4

Price: 60

Mean = 60

Bedrooms β‰₯ 4

Prices: 75, 85, 100

Mean = 86.7

---

🌿 Final Regression Tree

Size < 1200?
β”œβ”€β”€ Yes β†’ Predict Price = 45
└── No
    β”œβ”€β”€ Bedrooms < 4 β†’ Predict = 60
    └── Bedrooms β‰₯ 4 β†’ Predict = 86.7
---

πŸ” Example Prediction

Input:

  • Size = 1600
  • Bedrooms = 4

Prediction Path:

  • Size β‰₯ 1200 β†’ Right
  • Bedrooms β‰₯ 4 β†’ Right

Predicted Price = β‚Ή86.7 Lakhs

---

πŸ“Œ Key Formula

MSE = (1/n) Ξ£ (yi - yΜ„)Β²

Regression trees select splits that minimize prediction error.

---

⚑ Classification vs Regression Tree

Aspect Classification Tree Regression Tree
Output Class (Yes/No) Continuous value
Metric Gini / Entropy MSE / Variance
Leaf Node Majority class Mean value
---

πŸš€ Real-World Use Cases

  • House price prediction
  • Sales forecasting
  • Stock price estimation
  • Risk scoring in finance
  • Cyber risk severity prediction
---

πŸ” Cybersecurity Example

Failed Logins Data Transfer Risk Score
5Low20
20Medium60
50High90

Regression trees can predict continuous risk scores instead of just Yes/No threats.

Used in cybersecurity to estimate:
  • Risk Score (0–100)
  • Expected Loss
  • Attack Severity
ID3 Algorithm in Machine Learning

🌳 ID3 Algorithm (Machine Learning)

The ID3 (Iterative Dichotomiser 3) algorithm is a decision tree algorithm used for classification problems. It selects the feature with the highest Information Gain.

---

🧠 Core Idea

ID3 selects the attribute that reduces uncertainty the most.
  • Entropy β†’ Measures impurity
  • Information Gain β†’ Reduction in entropy
---

πŸ“Œ Key Formulas

Entropy

Entropy = - Ξ£ (p logβ‚‚ p)

Information Gain

IG = Entropy(parent) - Weighted Entropy(children)
---

βš™οΈ Step-by-Step Working

1. Calculate Entropy

Measure impurity of dataset.

2. Compute Information Gain

Evaluate each feature and calculate entropy after split.

3. Select Best Feature

Feature with highest Information Gain becomes root node.

4. Repeat Recursively

  • Continue splitting subsets
  • Stop when data becomes pure or features end
---

πŸ“Š Example: Play Tennis

Outlook Temperature Humidity Wind Play
SunnyHotHighWeakNo
SunnyHotHighStrongNo
OvercastHotHighWeakYes
RainMildHighWeakYes

ID3 calculates entropy and selects the best feature (e.g., Outlook).

---

🌿 Resulting Tree (Concept)

Outlook
β”œβ”€β”€ Sunny β†’ further split
β”œβ”€β”€ Overcast β†’ Yes
└── Rain β†’ further split
---

πŸ” Characteristics of ID3

βœ… Advantages

  • Simple and easy to understand
  • Fast computation
  • Works well on small datasets

⚠️ Limitations

  • Handles only categorical data
  • Prone to overfitting
  • No pruning mechanism
  • Biased toward features with many values
---

πŸ” Cybersecurity Use Case

Suspicious Login Detection

IP Reputation Login Attempts Time Threat
TrustedLowDayNo
UnknownHighNightYes
BlacklistedMediumNightYes

ID3 helps identify patterns and build rule-based intrusion detection systems.

---

🧠 Key Insight

ID3 builds decision trees by selecting features that maximize reduction in uncertainty using Information Gain.
CART Algorithm in Machine Learning

🌳 CART Algorithm (Machine Learning)

The CART (Classification and Regression Trees) algorithm is a decision tree algorithm used for both classification and regression problems.

CART always creates a binary tree (two branches at each split).
---

🧠 Core Idea

CART selects splits that minimize impurity (classification) or error (regression).

---

πŸ“Œ Key Concepts

Classification β†’ Gini Index

Gini = 1 - Ξ£ (pΒ²)

Lower Gini β†’ Better split

Regression β†’ Mean Squared Error (MSE)

MSE = (1/n) Ξ£ (yi - yΜ„)Β²

Lower MSE β†’ Better split

---

βš™οΈ Step-by-Step Working

1. Select Best Feature

Evaluate all features using Gini or MSE.

2. Find Best Split Point

Example: Size < 1200

3. Create Binary Split

  • Left β†’ Condition true
  • Right β†’ Condition false

4. Repeat Recursively

  • Continue splitting subsets
  • Stop when node becomes pure or minimum samples reached

5. Apply Pruning

Remove unnecessary branches to reduce overfitting.

---

πŸ“Š Example (Classification)

Loan Approval

Income Credit Score Approved
HighGoodYes
LowPoorNo
MediumGoodYes
LowGoodNo

🌿 CART Tree

Credit Score = Good?
β”œβ”€β”€ Yes β†’ Approved = Yes
└── No β†’ Approved = No
---

πŸ“ˆ Example (Regression)

House Price Prediction

Size Price
80040
120060
150075

🌿 CART Regression Tree

Size < 1200?
β”œβ”€β”€ Yes β†’ Mean = 40
└── No β†’ Mean = 67.5
---

πŸ” Characteristics of CART

βœ… Advantages

  • Works for both classification and regression
  • Handles numerical and categorical data
  • Supports pruning
  • More practical than ID3

⚠️ Limitations

  • Can overfit if not pruned
  • Sensitive to small data changes
  • Binary splits may increase tree depth
---

πŸ” Cybersecurity Use Case

Intrusion Detection

Feature Example
Login AttemptsHigh
IP ReputationBlacklisted
TimeNight

CART can detect suspicious activity and predict attack probability.

---

🧠 ID3 vs CART

Feature ID3 CART
Split Type Multi-way Binary
Metric Entropy Gini / MSE
Pruning No Yes
Data Type Categorical only Both
---

πŸ’‘ Key Insight

CART builds binary decision trees using Gini Index (classification) and MSE (regression), with pruning to improve generalization.
Pruning in Decision Trees

🌳 Pruning in Decision Trees

Pruning is the process of removing unnecessary branches from a decision tree to improve its performance on unseen data.

---

🧠 Why Pruning is Needed

  • Decision trees tend to overfit training data
  • They may learn noise and become too complex
Pruning improves generalization and reduces model complexity.
---

πŸ“Œ Types of Pruning

1️⃣ Pre-Pruning (Early Stopping)

Stops tree growth early based on conditions:

  • Maximum depth reached
  • Minimum samples per node
  • Minimum information gain threshold

βœ… Advantage

  • Faster training
  • Avoids overly complex trees

⚠️ Limitation

  • May underfit if stopped too early
---

2️⃣ Post-Pruning (Backward Pruning)

Build full tree first, then remove unnecessary branches.

Process:

  • Grow full tree
  • Evaluate subtrees
  • Remove branches that do not improve performance

βœ… Advantage

  • Better accuracy compared to pre-pruning
---

πŸ“Š Example

🌿 Before Pruning (Overfitted Tree)

Age
β”œβ”€β”€ <30
β”‚   β”œβ”€β”€ Student=Yes β†’ Yes
β”‚   └── Student=No β†’ No
β”œβ”€β”€ 31–40 β†’ Yes
└── >40
    β”œβ”€β”€ Credit=Fair β†’ Yes
    └── Credit=Excellent
        β”œβ”€β”€ Income=High β†’ No
        └── Income=Low β†’ Yes

Too complex β†’ captures noise

---

βœ‚οΈ After Pruning

Age
β”œβ”€β”€ <30 β†’ split on Student
β”œβ”€β”€ 31–40 β†’ Yes
└── >40 β†’ split on Credit
Simpler tree β†’ better generalization
---

βš™οΈ Common Pruning Techniques

Cost Complexity Pruning (CART)

RΞ±(T) = R(T) + Ξ±|T|
  • R(T) β†’ Error
  • |T| β†’ Number of leaves
  • Ξ± β†’ Complexity penalty

Balances accuracy and simplicity.

---

Reduced Error Pruning

  • Remove node if validation accuracy does not decrease

Minimum Error Pruning

  • Replace subtree with leaf if error reduces
---

πŸ” Key Benefits

  • Prevents overfitting
  • Improves interpretability
  • Reduces noise impact
  • Faster predictions
---

⚠️ Trade-Off

Too Much Pruning Too Little Pruning
Underfitting Overfitting
High bias High variance
---

πŸ” Cybersecurity Use Case

In intrusion detection systems:

  • Without pruning β†’ Too many false positives
  • With pruning β†’ Focus on real threats
Pruning helps reduce alert fatigue and improves decision-making in SOC environments.
---

🧠 Key Insight

Pruning simplifies decision trees by removing branches that do not improve predictive performance, reducing overfitting and improving generalization.

🌳 Decision Tree Classifier (Machine Learning)

Problem: Predict whether a login is a Threat or Safe based on features.

πŸ“Œ Python Program

from sklearn.tree import DecisionTreeClassifier

# Features: [Login Attempts, IP Reputation]
# 0 = Low, 1 = Medium, 2 = High
# IP: 0 = Trusted, 1 = Unknown, 2 = Blacklisted

X = [
    [2, 2],
    [1, 1],
    [2, 1],
    [0, 0],
    [1, 0]
]

# Labels: 1 = Threat, 0 = Safe
y = [1, 0, 1, 0, 0]

model = DecisionTreeClassifier()
model.fit(X, y)

# Test sample
prediction = model.predict([[2, 2]])

if prediction[0] == 1:
    print("🚨 Threat Detected")
else:
    print("βœ… Safe Activity")

β–Ά Run This Code



πŸ“Š Expected Output

🚨 Threat Detected

🧠 Explanation

  • We train a Decision Tree model using login behavior data.
  • Model learns patterns like high login attempts + bad IP = threat.
  • We test a new case: High attempts + Blacklisted IP.
  • The model predicts it as a Threat.

MCQs Decision Trees

Decision Tree Quiz

🌳 Interactive Quiz: Decision Trees

Q1. What is the main objective of a decision tree?
A. Maximize variance
B. Minimize impurity
C. Increase data size
D. Normalize features
βœ… Answer: B
Decision trees aim to reduce impurity using Gini or Entropy.
Q2. Which metric is used in ID3?
A. Gini Index
B. Entropy
C. MSE
D. Accuracy
βœ… Answer: B
ID3 uses Entropy and Information Gain.
Q3. What does a leaf node represent?
A. Feature
B. Split condition
C. Final prediction
D. Dataset
βœ… Answer: C
Leaf node gives the final output.
Q4. Gini impurity of a pure node is:
A. 1
B. 0
C. 0.5
D. -1
βœ… Answer: B
Pure node means all samples belong to one class.
Q5. Overfitting occurs when the tree is:
A. Small
B. Deep
C. Balanced
D. Normalized
βœ… Answer: B
Deep trees memorize training data and overfit.
Q6. Which algorithm uses Gini Index?
A. ID3
B. CART
C. KNN
D. Naive Bayes
βœ… Answer: B
CART uses Gini Index for splitting.
Q7. Decision trees can handle:
A. Only numerical data
B. Only categorical data
C. Both types
D. Only binary data
βœ… Answer: C
Handles both categorical and numerical data.
Q8. What is pruning?
A. Adding nodes
B. Removing nodes
C. Scaling features
D. Encoding data
βœ… Answer: B
Pruning removes unnecessary branches to reduce overfitting.
Q9. Entropy measures:
A. Accuracy
B. Impurity
C. Speed
D. Size
βœ… Answer: B
Entropy measures randomness or uncertainty.
Q10. A decision tree is best described as:
A. Linear model
B. Rule-based model
C. Neural model
D. Probabilistic model
βœ… Answer: B
Decision trees follow if-then rules.
Share