80 MINS intermediate
6. Data Visualization Systems
Module 06: Visualization
Matplotlib, Seaborn, Plotly, and Dashboard Design
A visualization that confuses its audience is worse than no visualization at all. Data visualization is not about making charts — it is about making data argue clearly for a specific interpretation. This module covers the complete visualization toolkit from production-grade static plots to interactive dashboards, with emphasis on the design principles that separate analysis from storytelling.
📊 The Visualization Decision Framework
Every visualization choice should be driven by the question you're answering:
- Distribution: Histogram, KDE plot, box plot, violin plot — what does the spread look like?
- Comparison: Bar chart, grouped bar, dot plot — how do values differ across categories?
- Relationship: Scatter plot, bubble chart, heat map — how do two variables co-vary?
- Composition: Stacked bar, area chart, treemap — how do parts contribute to the whole?
- Trend over time: Line chart, area chart, candlestick — how does a value change through time?
🎨 Production-Quality Static Plots
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Professional style configuration
plt.rcParams.update({
'figure.dpi': 150,
'figure.facecolor': 'white',
'font.family': 'DejaVu Sans',
'font.size': 11,
'axes.spines.top': False,
'axes.spines.right': False,
'axes.grid': True,
'grid.alpha': 0.3,
'axes.titlesize': 14,
'axes.titleweight': 'bold',
})
sns.set_palette('husl', 8)
df = pd.read_csv('sales_data.csv', parse_dates=['date'])
# Figure 1: Multi-panel EDA dashboard
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle('Sales Performance Dashboard — Q1 2024', fontsize=18, fontweight='bold', y=1.02)
# Panel 1: Revenue distribution
sns.histplot(df['revenue'], kde=True, ax=axes[0,0], color='steelblue', bins=40)
axes[0,0].axvline(df['revenue'].median(), color='red', linestyle='--', label=f'Median: ${df["revenue"].median():,.0f}')
axes[0,0].set_title('Revenue Distribution')
axes[0,0].legend()
# Panel 2: Revenue by region (with sample sizes)
region_stats = df.groupby('region').agg(mean=('revenue','mean'), count=('revenue','size')).reset_index()
bars = axes[0,1].bar(region_stats['region'], region_stats['mean'],
color=sns.color_palette('husl', len(region_stats)))
# Add count labels on bars
for bar, (_, row) in zip(bars, region_stats.iterrows()):
axes[0,1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 50,
f'n={int(row["count"])}', ha='center', va='bottom', fontsize=9)
axes[0,1].set_title('Average Revenue by Region')
axes[0,1].set_ylabel('Average Revenue ($)')
# Panel 3: Correlation heatmap
numeric_cols = df.select_dtypes(include='number')
corr = numeric_cols.corr()
mask = np.triu(np.ones_like(corr, dtype=bool)) # show only lower triangle
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f', cmap='RdBu_r',
center=0, ax=axes[0,2], square=True, cbar_kws={'shrink': 0.8})
axes[0,2].set_title('Feature Correlation Matrix')
# Panel 4: Time series trend
monthly = df.resample('ME', on='date')['revenue'].sum().reset_index()
axes[1,0].plot(monthly['date'], monthly['revenue'], marker='o', linewidth=2, color='steelblue')
axes[1,0].fill_between(monthly['date'], monthly['revenue'], alpha=0.2, color='steelblue')
axes[1,0].set_title('Monthly Revenue Trend')
axes[1,0].set_ylabel('Total Revenue ($)')
axes[1,0].tick_params(axis='x', rotation=45)
# Panel 5: Box plots for outlier visualization
df.boxplot(column='revenue', by='region', ax=axes[1,1])
axes[1,1].set_title('Revenue Distribution by Region')
plt.sca(axes[1,1])
plt.title('Revenue by Region')
plt.suptitle('')
# Panel 6: Scatter with regression line
sns.regplot(data=df, x='units_sold', y='revenue', ax=axes[1,2],
scatter_kws={'alpha': 0.4, 'color': 'steelblue'},
line_kws={'color': 'red', 'linewidth': 2})
axes[1,2].set_title('Units Sold vs Revenue')
plt.tight_layout()
plt.savefig('eda_dashboard.png', dpi=150, bbox_inches='tight')
plt.show()🚀 Interactive Visualizations with Plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
df = pd.read_csv('sales_data.csv', parse_dates=['date'])
# Interactive time series with range selector
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df['date'], y=df['revenue'],
mode='lines', name='Revenue',
line=dict(color='royalblue', width=1.5),
hovertemplate='%{x|%B %d, %Y}
Revenue: $%{y:,.2f} '
))
# Add 30-day rolling average
df['revenue_30d'] = df['revenue'].rolling(30).mean()
fig.add_trace(go.Scatter(
x=df['date'], y=df['revenue_30d'],
mode='lines', name='30-Day MA',
line=dict(color='orange', width=2, dash='dash')
))
fig.update_layout(
title='Revenue Trend with Moving Average',
xaxis_title='Date',
yaxis_title='Revenue ($)',
xaxis_rangeslider_visible=True,
template='plotly_white',
hovermode='x unified'
)
fig.write_html('interactive_chart.html') # save for sharing
fig.show()
# Animated scatter plot — show dimension of change over time
fig_animated = px.scatter(
df.groupby(['month', 'region']).agg({'revenue': 'sum', 'units_sold': 'sum'}).reset_index(),
x='units_sold', y='revenue',
color='region', size='revenue',
animation_frame='month',
title='Revenue vs Units by Region Over Time',
template='plotly_white'
)
fig_animated.show()Data Science: EDA Terminal
ecommerce_churn.csv
1000 rows × 4 columns| customer_id | tenure_months | total_spend | churn |
|---|---|---|---|
| usr_892 | 12 | 450.50 | 0 |
| usr_104 | 2 | 45.00 | 1 |
| usr_443 | 36 | 2100.00 | 0 |
| usr_991 | 1 | 12.99 | 1 |
| usr_202 | 24 | 1250.75 | 0 |
| usr_331 | 48 | 3400.20 | 0 |
| usr_705 | 3 | 110.00 | 1 |
Run
df.describe() in Python to see statistical summaries.analyze.py
Jupyter Runtime
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Awaiting execution...
Knowledge Check
Ready to test your understanding of 6. Data Visualization Systems?