Artificial Intelligence | VoidX Academy

13. AI Ethics and Governance

Module 13: Ethics

Building AI That Doesn't Cause Harm

The technical ability to build an AI system does not make it ethical, fair, or safe to deploy. AI systems can discriminate against protected groups, amplify existing societal biases, make opaque decisions that affect people's lives, and be weaponized for mass manipulation. Understanding AI ethics is not optional—it is a professional responsibility and increasingly a legal requirement. Engineers who ignore these dimensions build systems that cause real harm to real people.

⚖️ AI Bias and Fairness

AI systems learn from historical data, which reflects historical human decisions—many of which were biased, discriminatory, or unjust. A model trained on biased data learns and amplifies those biases at scale.

Real-World Bias Cases:

COMPAS (Criminal Recidivism): A widely used criminal justice algorithm was shown by ProPublica to assign higher recidivism risk scores to Black defendants than white defendants with similar actual reoffending rates. The system encoded historical disparities in policing and incarceration as objective predictions.
Amazon's Recruiting Tool: Amazon trained a resume screening tool on 10 years of hiring data. Since the tech industry historically hired mostly men, the model learned to penalize resumes containing words like "women's" (as in "women's chess club") and downgrade graduates of all-women's colleges. Amazon discontinued it.
Facial Recognition: MIT researcher Joy Buolamwini demonstrated that commercial facial recognition systems had significantly higher error rates for darker-skinned women (up to 35% error) compared to lighter-skinned men (0.8% error). These systems were deployed in law enforcement contexts.

Types of Bias:

Historical Bias: The data itself reflects historical discrimination. Even a perfect model on this data perpetuates the discrimination.
Representation Bias: Certain groups are underrepresented in training data, causing the model to perform poorly for those groups.
Measurement Bias: The features used as proxies for the target contain different amounts of noise for different groups. Crime statistics are measured by police presence—more policing → more recorded crime, regardless of actual crime rates.
Label Bias: Human annotators bring their own biases to labeling tasks. Sentiment labels may reflect cultural assumptions. Toxicity classifiers may label African American Vernacular English as more "toxic."

Fairness Metrics — Understanding the Trade-offs:

Demographic Parity: Equal prediction rates across groups. A loan approval model with demographic parity approves the same proportion of applicants from each group.
Equalized Odds: Equal true positive rates AND false positive rates across groups. More stringent than demographic parity.
Individual Fairness: Similar individuals should receive similar predictions.
Important insight: It is mathematically impossible to simultaneously satisfy all common fairness criteria when base rates differ between groups. Fairness is not a single objective—it involves genuine value trade-offs that must be made explicitly by humans, not hidden in algorithmic choices.

from fairlearn.metrics import (demographic_parity_difference,
                               equalized_odds_difference,
                               MetricFrame)
from sklearn.metrics import accuracy_score

mf = MetricFrame(
    metrics={'accuracy': accuracy_score},
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=gender_column
)

print("Overall accuracy:", mf.overall)
print("Accuracy by group:\n", mf.by_group)
print("Disparity (max-min):", mf.difference())

print("\nDemographic parity difference:", 
      demographic_parity_difference(y_test, y_pred, sensitive_features=gender_column))
print("Equalized odds difference:", 
      equalized_odds_difference(y_test, y_pred, sensitive_features=gender_column))

🔍 Explainable AI (XAI)

High-stakes AI decisions—loan approvals, medical diagnoses, parole recommendations, credit scoring—require explanations. "The model said no" is legally and ethically insufficient. Explainability is required by regulations including GDPR's "right to explanation" and the EU AI Act.

SHAP (SHapley Additive exPlanations): Based on cooperative game theory, SHAP assigns each feature a contribution score for a specific prediction. Answers: "How much did each feature push this prediction up or down from the base rate?" Consistent, theoretically grounded, and model-agnostic.

import shap

explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

shap.summary_plot(shap_values, X_test, feature_names=feature_names)

idx = 42
shap.force_plot(explainer.expected_value, shap_values[idx], X_test[idx],
                feature_names=feature_names)

LIME (Local Interpretable Model-agnostic Explanations): Fits a simple interpretable model (linear regression) locally around a specific prediction to approximate what the black-box model is doing in that neighborhood. Works for any model and any input type (tabular, text, images).

🔒 Data Privacy and AI

Differential Privacy: A mathematical framework for adding calibrated noise to outputs so that the presence or absence of any individual's data cannot be detected. Implemented in PyTorch with the Opacus library. Used by Apple and Google for federated learning on user devices.
Federated Learning: Train models on data that never leaves user devices. Only gradient updates (not raw data) are sent to a central server. Preserves data privacy for sensitive domains (medical, financial). Used by Google for Gboard keyboard predictions.
Model Inversion Attacks: Adversaries can sometimes reconstruct training data from model outputs. Training on sensitive data without privacy guarantees can leak individual information even from the model API.

📜 AI Regulations and Governance

EU AI Act (2024): The world's first comprehensive AI regulation. Categorizes AI systems by risk (unacceptable, high, limited, minimal). High-risk systems (medical, justice, employment, credit) require conformity assessments, human oversight, transparency, and registration. Bans certain AI uses outright (social scoring, mass biometric surveillance).
GDPR (General Data Protection Regulation): European privacy regulation with direct AI implications: right to explanation for automated decisions, right to data deletion (complicates model training), data minimization principles, and consent requirements for personal data processing.
NIST AI Risk Management Framework: US framework for managing AI risks—Govern, Map, Measure, Manage. Voluntary but increasingly referenced in government procurement requirements.

13. AI Ethics and Governance

Module 13: Ethics

Building AI That Doesn't Cause Harm

⚖️ AI Bias and Fairness

Real-World Bias Cases:

COMPAS (Criminal Recidivism): A widely used criminal justice algorithm was shown by ProPublica to assign higher recidivism risk scores to Black defendants than white defendants with similar actual reoffending rates. The system encoded historical disparities in policing and incarceration as objective predictions.
Amazon's Recruiting Tool: Amazon trained a resume screening tool on 10 years of hiring data. Since the tech industry historically hired mostly men, the model learned to penalize resumes containing words like "women's" (as in "women's chess club") and downgrade graduates of all-women's colleges. Amazon discontinued it.
Facial Recognition: MIT researcher Joy Buolamwini demonstrated that commercial facial recognition systems had significantly higher error rates for darker-skinned women (up to 35% error) compared to lighter-skinned men (0.8% error). These systems were deployed in law enforcement contexts.

Types of Bias:

Historical Bias: The data itself reflects historical discrimination. Even a perfect model on this data perpetuates the discrimination.
Representation Bias: Certain groups are underrepresented in training data, causing the model to perform poorly for those groups.
Measurement Bias: The features used as proxies for the target contain different amounts of noise for different groups. Crime statistics are measured by police presence—more policing → more recorded crime, regardless of actual crime rates.
Label Bias: Human annotators bring their own biases to labeling tasks. Sentiment labels may reflect cultural assumptions. Toxicity classifiers may label African American Vernacular English as more "toxic."

Fairness Metrics — Understanding the Trade-offs:

Demographic Parity: Equal prediction rates across groups. A loan approval model with demographic parity approves the same proportion of applicants from each group.
Equalized Odds: Equal true positive rates AND false positive rates across groups. More stringent than demographic parity.
Individual Fairness: Similar individuals should receive similar predictions.
Important insight: It is mathematically impossible to simultaneously satisfy all common fairness criteria when base rates differ between groups. Fairness is not a single objective—it involves genuine value trade-offs that must be made explicitly by humans, not hidden in algorithmic choices.

from fairlearn.metrics import (demographic_parity_difference,
                               equalized_odds_difference,
                               MetricFrame)
from sklearn.metrics import accuracy_score

mf = MetricFrame(
    metrics={'accuracy': accuracy_score},
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=gender_column
)

print("Overall accuracy:", mf.overall)
print("Accuracy by group:\n", mf.by_group)
print("Disparity (max-min):", mf.difference())

print("\nDemographic parity difference:", 
      demographic_parity_difference(y_test, y_pred, sensitive_features=gender_column))
print("Equalized odds difference:", 
      equalized_odds_difference(y_test, y_pred, sensitive_features=gender_column))

🔍 Explainable AI (XAI)

import shap

explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

shap.summary_plot(shap_values, X_test, feature_names=feature_names)

idx = 42
shap.force_plot(explainer.expected_value, shap_values[idx], X_test[idx],
                feature_names=feature_names)

🔒 Data Privacy and AI

Differential Privacy: A mathematical framework for adding calibrated noise to outputs so that the presence or absence of any individual's data cannot be detected. Implemented in PyTorch with the Opacus library. Used by Apple and Google for federated learning on user devices.
Federated Learning: Train models on data that never leaves user devices. Only gradient updates (not raw data) are sent to a central server. Preserves data privacy for sensitive domains (medical, financial). Used by Google for Gboard keyboard predictions.
Model Inversion Attacks: Adversaries can sometimes reconstruct training data from model outputs. Training on sensitive data without privacy guarantees can leak individual information even from the model API.

📜 AI Regulations and Governance

EU AI Act (2024): The world's first comprehensive AI regulation. Categorizes AI systems by risk (unacceptable, high, limited, minimal). High-risk systems (medical, justice, employment, credit) require conformity assessments, human oversight, transparency, and registration. Bans certain AI uses outright (social scoring, mass biometric surveillance).
GDPR (General Data Protection Regulation): European privacy regulation with direct AI implications: right to explanation for automated decisions, right to data deletion (complicates model training), data minimization principles, and consent requirements for personal data processing.
NIST AI Risk Management Framework: US framework for managing AI risks—Govern, Map, Measure, Manage. Voluntary but increasingly referenced in government procurement requirements.

13. AI Ethics and Governance

Building AI That Doesn't Cause Harm

⚖️ AI Bias and Fairness

🔍 Explainable AI (XAI)

🔒 Data Privacy and AI

📜 AI Regulations and Governance

Knowledge Check

13. AI Ethics and Governance

Building AI That Doesn't Cause Harm

⚖️ AI Bias and Fairness

🔍 Explainable AI (XAI)

🔒 Data Privacy and AI

📜 AI Regulations and Governance

Knowledge Check