Artificial Intelligence | VoidX Academy

7. Unsupervised Learning

Module 07: Unsupervised Learning

Finding Structure Without Labels

Most data in the world is unlabeled. Getting humans to label data is expensive, slow, and sometimes impossible—you cannot label every network packet for intrusion detection, every astronomical object for classification, or every customer behavior pattern for segmentation. Unsupervised learning discovers structure, patterns, and representations from data without requiring a single label. It is also increasingly the engine behind foundation models—GPT's pretraining was a form of self-supervised (unsupervised) learning on billions of text tokens.

🎯 K-Means Clustering

K-Means partitions data into K clusters by iteratively assigning each point to the nearest cluster center (centroid) and recomputing centroids as the mean of assigned points. Simple, fast, and widely used for customer segmentation, image compression, and data summarization.

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

inertias = []
K_range = range(1, 11)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    inertias.append(kmeans.inertia_)

plt.plot(K_range, inertias, 'bo-')
plt.xlabel('Number of Clusters K')
plt.ylabel('Inertia (Within-cluster Sum of Squares)')
plt.title('Elbow Method — Choose K at the Elbow')
plt.show()

best_k = 4  # determined from elbow plot
kmeans = KMeans(n_clusters=best_k, random_state=42, n_init=10)
labels = kmeans.fit_predict(X_scaled)

df['cluster'] = labels
print(df.groupby('cluster').mean())  # profile each cluster

Choosing K — The Elbow Method: Plot inertia (within-cluster sum of squares) for K from 1 to 10. The "elbow"—where inertia stops decreasing sharply—suggests the optimal K. Silhouette score provides another evaluation metric: values close to 1 indicate tight, well-separated clusters.

K-Means Limitations: Assumes spherical clusters of similar size. Fails on elongated, non-convex, or very different sized clusters. Sensitive to initialization and outliers. Always run with multiple random initializations (n_init=10).

🌲 Hierarchical Clustering

Builds a tree (dendrogram) of nested clusters by progressively merging (agglomerative) or splitting (divisive) data points. Unlike K-Means, you don't need to specify K in advance—you can cut the dendrogram at any level to produce any number of clusters.

from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

linkage_matrix = linkage(X_scaled, method='ward')  # Ward minimizes within-cluster variance
plt.figure(figsize=(14, 6))
dendrogram(linkage_matrix, truncate_mode='level', p=5)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance (Ward)')
plt.show()

agg = AgglomerativeClustering(n_clusters=4, linkage='ward')
labels = agg.fit_predict(X_scaled)

Linkage Methods: Ward (minimize within-cluster variance—best for compact clusters), Complete (max distance between clusters—good for well-separated groups), Average (mean distance), Single (min distance—tends to produce chains).

📉 Principal Component Analysis (PCA)

PCA is the most important dimensionality reduction technique in AI. It projects high-dimensional data onto lower-dimensional directions (principal components) that capture maximum variance. It reduces noise, removes redundant correlated features, enables visualization of high-dimensional data, and speeds up downstream model training.

from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np

pca = PCA()  # compute all components first
pca.fit(X_scaled)

explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

plt.figure(figsize=(10, 4))
plt.bar(range(1, len(explained_variance)+1), explained_variance, alpha=0.7, label='Individual')
plt.plot(range(1, len(explained_variance)+1), cumulative_variance, 'ro-', label='Cumulative')
plt.axhline(y=0.95, color='green', linestyle='--', label='95% threshold')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.legend()
plt.title('PCA Explained Variance — Choose Components Above 95%')

n_components = np.argmax(cumulative_variance >= 0.95) + 1
print(f"Components needed for 95% variance: {n_components}")

pca_final = PCA(n_components=n_components)
X_reduced = pca_final.fit_transform(X_scaled)
print(f"Shape reduced from {X_scaled.shape} to {X_reduced.shape}")

pca_2d = PCA(n_components=2)
X_2d = pca_2d.fit_transform(X_scaled)
plt.figure(figsize=(8, 6))
scatter = plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis', alpha=0.7)
plt.colorbar(scatter)
plt.title('PCA 2D Visualization — Can you see class separation?')

How PCA Works: Computes the covariance matrix of the features, finds its eigenvectors (the principal components—directions of maximum variance), and projects data onto these directions. The first principal component explains the most variance; each subsequent component is orthogonal to all previous ones and explains the most remaining variance.

🕵️ Anomaly Detection

Anomaly detection identifies unusual patterns that don't conform to expected behavior. Critical for fraud detection, intrusion detection, equipment failure prediction, and quality control—precisely the use cases where labeled examples of anomalies are scarce (because anomalies, by definition, rarely occur).

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

iso_forest = IsolationForest(contamination=0.05, random_state=42)  # 5% expected anomalies
anomaly_labels = iso_forest.fit_predict(X_scaled)
anomaly_scores = iso_forest.score_samples(X_scaled)

anomalies = X_scaled[anomaly_labels == -1]
normal = X_scaled[anomaly_labels == 1]
print(f"Detected {len(anomalies)} anomalies out of {len(X_scaled)} samples")

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)
lof_labels = lof.fit_predict(X_scaled)

Key Algorithms:

Isolation Forest: Randomly partitions data using decision trees. Anomalies are isolated with fewer splits (shorter path length) because they are different from the majority. Fast, effective, and works well in high dimensions.
Local Outlier Factor (LOF): Compares the local density of each point to its neighbors. Points in low-density regions surrounded by high-density neighborhoods are anomalies. Better for detecting anomalies in varying-density regions.
Autoencoder-based Detection (Deep Learning): Train an autoencoder to reconstruct normal data. Anomalies have high reconstruction error because the model never learned to represent them. Powerful for complex, high-dimensional data like network traffic and sensor readings.

7. Unsupervised Learning

Module 07: Unsupervised Learning

Finding Structure Without Labels

🎯 K-Means Clustering

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

inertias = []
K_range = range(1, 11)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    inertias.append(kmeans.inertia_)

plt.plot(K_range, inertias, 'bo-')
plt.xlabel('Number of Clusters K')
plt.ylabel('Inertia (Within-cluster Sum of Squares)')
plt.title('Elbow Method — Choose K at the Elbow')
plt.show()

best_k = 4  # determined from elbow plot
kmeans = KMeans(n_clusters=best_k, random_state=42, n_init=10)
labels = kmeans.fit_predict(X_scaled)

df['cluster'] = labels
print(df.groupby('cluster').mean())  # profile each cluster

🌲 Hierarchical Clustering

from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

linkage_matrix = linkage(X_scaled, method='ward')  # Ward minimizes within-cluster variance
plt.figure(figsize=(14, 6))
dendrogram(linkage_matrix, truncate_mode='level', p=5)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance (Ward)')
plt.show()

agg = AgglomerativeClustering(n_clusters=4, linkage='ward')
labels = agg.fit_predict(X_scaled)

📉 Principal Component Analysis (PCA)

from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np

pca = PCA()  # compute all components first
pca.fit(X_scaled)

explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

plt.figure(figsize=(10, 4))
plt.bar(range(1, len(explained_variance)+1), explained_variance, alpha=0.7, label='Individual')
plt.plot(range(1, len(explained_variance)+1), cumulative_variance, 'ro-', label='Cumulative')
plt.axhline(y=0.95, color='green', linestyle='--', label='95% threshold')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.legend()
plt.title('PCA Explained Variance — Choose Components Above 95%')

n_components = np.argmax(cumulative_variance >= 0.95) + 1
print(f"Components needed for 95% variance: {n_components}")

pca_final = PCA(n_components=n_components)
X_reduced = pca_final.fit_transform(X_scaled)
print(f"Shape reduced from {X_scaled.shape} to {X_reduced.shape}")

pca_2d = PCA(n_components=2)
X_2d = pca_2d.fit_transform(X_scaled)
plt.figure(figsize=(8, 6))
scatter = plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis', alpha=0.7)
plt.colorbar(scatter)
plt.title('PCA 2D Visualization — Can you see class separation?')

🕵️ Anomaly Detection

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

iso_forest = IsolationForest(contamination=0.05, random_state=42)  # 5% expected anomalies
anomaly_labels = iso_forest.fit_predict(X_scaled)
anomaly_scores = iso_forest.score_samples(X_scaled)

anomalies = X_scaled[anomaly_labels == -1]
normal = X_scaled[anomaly_labels == 1]
print(f"Detected {len(anomalies)} anomalies out of {len(X_scaled)} samples")

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)
lof_labels = lof.fit_predict(X_scaled)

Key Algorithms:

Isolation Forest: Randomly partitions data using decision trees. Anomalies are isolated with fewer splits (shorter path length) because they are different from the majority. Fast, effective, and works well in high dimensions.
Local Outlier Factor (LOF): Compares the local density of each point to its neighbors. Points in low-density regions surrounded by high-density neighborhoods are anomalies. Better for detecting anomalies in varying-density regions.
Autoencoder-based Detection (Deep Learning): Train an autoencoder to reconstruct normal data. Anomalies have high reconstruction error because the model never learned to represent them. Powerful for complex, high-dimensional data like network traffic and sensor readings.

7. Unsupervised Learning

Finding Structure Without Labels

🎯 K-Means Clustering

🌲 Hierarchical Clustering

📉 Principal Component Analysis (PCA)

🕵️ Anomaly Detection

Knowledge Check

7. Unsupervised Learning

Finding Structure Without Labels

🎯 K-Means Clustering

🌲 Hierarchical Clustering

📉 Principal Component Analysis (PCA)

🕵️ Anomaly Detection

Knowledge Check