5. Machine Learning Fundamentals
Teaching Machines to Learn
Machine learning is the practice of building systems that improve their performance on a task through experience—specifically, through exposure to data. Instead of programming every decision rule explicitly, you provide examples and let the algorithm discover the rules. This paradigm shift—from programming to learning—is what makes AI scale to problems too complex for hand-coded logic.
This module covers the three learning paradigms (supervised, unsupervised, reinforcement), the model evaluation framework that prevents you from building models that work in notebooks but fail in production, and the critical concepts of overfitting and underfitting that determine the health of every trained model.
🎓 The Three Learning Paradigms
Supervised Learning
The most widely used paradigm in production AI. The algorithm learns from labeled examples—pairs of (input, correct output). It learns to map inputs to outputs by minimizing prediction error on the training examples, then generalizes to make predictions on new, unseen inputs.
When to use it: Whenever you have historical data with known correct answers. Email spam detection (labeled emails: spam/not-spam), house price prediction (houses with known sale prices), medical diagnosis (cases with confirmed diagnoses), sentiment analysis (reviews with known ratings).
The key requirement: Labeled data. Labeling is often the most expensive and time-consuming part of supervised learning—it requires human expertise. This cost drives innovation in semi-supervised and self-supervised learning.
Task types: Classification (predict a category) and Regression (predict a continuous number).
Unsupervised Learning
The algorithm learns structure from unlabeled data. There are no correct answers to compare against—the model must discover patterns, groupings, or representations on its own. This is powerful for exploration and compression of high-dimensional data.
Use cases: Customer segmentation (group similar customers without predefined categories), anomaly detection (find unusual patterns—fraud, equipment failure—without labeled anomalies), dimensionality reduction (compress high-dimensional features into fewer dimensions while preserving structure), and representation learning (learn useful feature representations for downstream tasks).
The evaluation challenge: Without labels, measuring "how good" an unsupervised model is requires domain knowledge and human judgment. There's no ground truth to compare against—a significant practical challenge.
Reinforcement Learning
An agent learns to make sequential decisions by interacting with an environment, receiving rewards for good actions and penalties for bad ones. The agent's goal is to maximize cumulative reward over time. No labeled data—learning comes from consequences.
Use cases: Game playing (AlphaGo, OpenAI Five, AlphaStar), robotics control, recommendation systems, autonomous driving, and fine-tuning LLMs with human feedback (RLHF—Reinforcement Learning from Human Feedback is how ChatGPT was aligned).
The key challenge: The reward signal is often sparse (games reward you only at the end), delayed (actions have consequences far in the future), and the action space can be enormous. Training is sample-inefficient—often requiring billions of environment interactions.
🔄 Model Training — The Learning Loop
Training a machine learning model follows a consistent pattern regardless of algorithm complexity:
- Initialize: Start with random or default parameter values. For deep learning, initialization matters significantly—poor initialization can prevent convergence.
- Forward Pass: Feed training examples through the model to produce predictions.
- Compute Loss: Calculate how wrong the predictions are using the loss function.
- Backward Pass (Backpropagation for neural nets): Compute gradients of the loss with respect to each parameter.
- Update Parameters: Move parameters in the direction that reduces loss (gradient descent).
- Repeat: Iterate over the entire training dataset multiple times (each full pass = one epoch) until the loss stops improving on the validation set.
📉 Overfitting and Underfitting — The Bias-Variance Tradeoff
The two fundamental failure modes of machine learning models are overfitting and underfitting. Every modeling decision you make is implicitly navigating the tradeoff between them.
Underfitting (High Bias): The model is too simple to capture the underlying patterns in the data. Both training and validation error are high. The model hasn't learned enough from the data—it has "underfit" the training distribution. Fix: increase model capacity (more parameters, more complex architecture), train longer, add more features, reduce regularization.
Overfitting (High Variance): The model has memorized the training data, including its noise and random fluctuations, rather than learning the underlying pattern. Training error is very low but validation/test error is high. The model generalizes poorly to new data. Fix: more training data, regularization (L1, L2, dropout), reduce model complexity, early stopping.
The Bias-Variance Tradeoff: Total prediction error = Bias² + Variance + Irreducible Noise. Bias is error from wrong assumptions (underfitting). Variance is error from sensitivity to training data fluctuations (overfitting). Reducing bias tends to increase variance and vice versa. The art of machine learning is finding the sweet spot.
Detecting Overfitting with Training Curves:
Overfitting signature: training loss continues falling but validation loss starts rising. The red line marks where to stop training.
📊 Model Evaluation — The Full Framework
Accuracy alone is almost always insufficient. You need a complete evaluation framework appropriate to your problem type.
The Train/Val/Test Split:
- Training Set (70–80%): The data the model learns from. Gradients computed on this data.
- Validation Set (10–15%): Used to tune hyperparameters and make architecture decisions. The model never trains on this data, but you use it repeatedly to make decisions—which means it's implicitly influencing the model. Small information leakage occurs here.
- Test Set (10–15%): The final held-out evaluation. Touched exactly once, at the very end. Produces your honest estimate of generalization performance. If you look at test set results and change your model, you've contaminated your test set—it's no longer a fair evaluation of generalization.
Cross-Validation: For small datasets where a fixed split wastes too much data, k-fold cross-validation partitions data into k equal folds and trains k separate models (each on k-1 folds, validated on the remaining 1). The average performance across all folds is a more reliable estimate than a single validation split.
Knowledge Check
Ready to test your understanding of 5. Machine Learning Fundamentals?