AI Bias Uncovered
What is Bias in ML Data?
Bias in machine learning data means that the data used to train the model does not accurately or fairly represent the real world, or the specific population the model will be used on. When a model learns from biased data, it learns the unfair patterns and mistakes present in that data. This leads to a biased model that makes unfair, inaccurate, or discriminatory decisions when used in the real world.Why Does Bias Matter?
Biased models can cause real harm. If a model is used to decide who gets a loan, who gets interviewed for a job, or even how a self-driving car should react, and that model is biased against certain groups of people, the outcomes will be unfair and unequal.Common Types of Data Bias
Bias can sneak into the data in several ways. Here are three common types:1. Historical Bias (or Systemic Bias)
This happens when the data reflects past unfair decisions or inequalities that already exist in the world.- Example: If you train a model to predict who will be a good manager using 50 years of historical hiring data, and historically, 90% of managers were men, the model will learn that being a man is a strong predictor of being a manager. It is simply reflecting the historical bias, even if the company wants to be fair now.
2. Selection Bias (or Sampling Bias)
This occurs when the data collected is not a random or representative sample of the population. Some groups are over-represented, and others are under-represented or completely missing.- Example: A company creates a facial recognition system, but all the photos used to train the system are of people with light skin. When the system is used on people with dark skin, it performs poorly or fails completely because it was never trained on that group. The data set was selected unfairly.
3. Measurement Bias
This happens when the way we collect or measure information is flawed or inconsistent across different groups.- Example: Imagine a fitness tracker that measures heart rate. If the sensor works perfectly on thin wrists but struggles to get an accurate reading on very large wrists, the data collected for people with large wrists will be consistently wrong. The measurement itself is biased.
How to Address Data Bias
Fixing bias is a continuous process, but here are the main strategies:- Data Auditing: Carefully examine the training data to look for imbalances, missing groups, or unfair historical patterns.
- Fair Data Collection: Actively seek out and include data from under-represented groups to ensure the sample is truly representative.
- Re-weighting: If you can't collect more data, you can mathematically adjust the importance of the existing data points so that the model pays more attention to the under-represented groups.
- Model Evaluation: Test the final model specifically on different groups (e.g., different genders, races, ages) to ensure it performs equally well for everyone, not just the majority group.
MCQs on Historical, Selection, and Measurement Bias (with Answers)
1. Historical Bias
Q1. Historical bias occurs when…
A. The model updates itself too frequently
B. The training data reflects past patterns that are no longer valid
C. Data is collected from too many sources
D. The sample size is too large
Answer: B
Q2. Which scenario best represents historical bias?
A. A dataset taken only from urban regions
B. A system trained on old data that reflects outdated behaviours
C. A measuring instrument producing inaccurate values
D. Missing values present in the dataset
Answer: B
Q3. Historical bias mainly arises from…
A. Data processing errors
B. Past societal or systemic inequalities embedded in data
C. Overfitting in the model
D. Incorrect feature scaling
Answer: B
Q4. Which statement is true regarding historical bias?
A. It can be fixed simply by adding more data
B. It persists even when the dataset is large
C. It occurs only in real-time data streams
D. It is caused by hardware issues
Answer: B
2. Selection Bias
Q5. Selection bias happens when…
A. Data contains incorrect measurements
B. The sample used for training is not representative of the target population
C. There are too many features in the dataset
D. Labels are assigned incorrectly
Answer: B
Q6. Which scenario shows selection bias?
A. Using only weekday data to train a model that predicts weekend sales
B. Using a faulty sensor to record values
C. Using old historical data
D. Using inconsistent labels
Answer: A
Q7. Selection bias can lead to…
A. Better generalization
B. Poor model performance on unseen population segments
C. Accurate predictions across all groups
D. Increased robustness
Answer: B
Q8. Which step helps reduce selection bias?
A. Increasing batch size during training
B. Ensuring balanced and diverse data collection
C. Using deeper neural networks
D. Dropping more features
Answer: B
3. Measurement Bias
Q9. Measurement bias occurs when…
A. Data collection tools record incorrect or inconsistent values
B. Only a small sample is used
C. Old data is used for training
D. Some groups are excluded from the dataset
Answer: A
Q10. Which is an example of measurement bias?
A. Only collecting data from one city
B. Capturing height using a faulty scale
C. Using outdated data
D. Class labels missing for some entries
Answer: B
Q11. Measurement bias often arises due to…
A. Sensor errors, inaccurate tools, or improper recording methods
B. Too many features in the dataset
C. Overfitting issues
D. Mislabeled categories
Answer: A
Q12. A consistent gap between actual and recorded values indicates…
A. Selection bias
B. Historical bias
C. Measurement bias
D. Sampling error
Answer: C
Combined Scenario-Based MCQs
Q13. An AI hiring model rejects candidates because past company records favored male applicants. This is…
A. Measurement bias
B. Selection bias
C. Historical bias
D. Random error
Answer: C
Q14. A survey model is trained only on responses from college students. This is…
A. Measurement bias
B. Historical bias
C. Selection bias
D. Data augmentation
Answer: C
Q15. A smartwatch records higher heart rate than actual due to a calibration issue. This is…
A. Historical bias
B. Measurement bias
C. Selection bias
D. Sampling error
Answer: B
Q16. A retail model uses only data from high-income neighborhoods. It will most likely suffer from…
A. Selection bias
B. Measurement bias
C. Historical bias
D. Drift
Answer: A
Q17. Old criminal records used to train an ML system reflect biased policing. This is an example of…
A. Selection bias
B. Measurement bias
C. Overfitting
D. Historical bias
Answer: D
Q18. A face-recognition system performs poorly on darker skin tones because the camera underexposes faces. This is…
A. Measurement bias
B. Selection bias
C. Historical bias
D. Reinforcement error
Answer: A
Q19. A sentiment analysis model is trained only on positive reviews from one platform. What bias appears?
A. Selection bias
B. Historical bias
C. Measurement bias
D. Response bias
Answer: A
Q20. A model trained on 2010–2012 data fails to adapt to today's customer behavior. This is…
A. Measurement bias
B. Selection bias
C. Historical bias
D. Overgeneralization
Answer: C
