Build With Me: The EDA Detective
Uncovering the Story
Welcome to your first data build. We have been handed a dataset of e-commerce customers, and the business wants to know why people are churning. We aren't going to build a model yet; we are going to act as detectives. Let's write the code step-by-step and watch the EDA Terminal on the left reveal the answers.
Step 1: The Initial Inspection
Always start by looking at the raw shape of the data. Type this into your editor:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('ecommerce_churn.csv')
# Inspect the first 5 rows and statistical summary
print("--- RAW DATA ---")
print(df.head())
print("\n--- SUMMARY STATS ---")
print(df.describe())Your Turn: Hit Run Cell. Look at the terminal output. Notice the average tenure and the churn rate.
Step 2: Visualizing Distributions
Numbers in a terminal are hard to read. Let's visualize the distribution of how long customers stay with us (tenure). Add this to your code:
# Visualize the distribution of tenure
sns.histplot(data=df, x='tenure_months', bins=20)
plt.title('Customer Tenure Distribution')
plt.show()Your Turn: Hit Run again. The left pane will switch to the Visualizer. Do we have mostly new customers or long-term loyalists?
Step 3: Finding the Culprit (Correlation)
Now for the magic. We want to know what correlates with 'churn'. Let's generate a correlation heatmap.
# Generate a correlation matrix corr_matrix = df.corr() # Plot it as a heatmap sns.heatmap(corr_matrix, annot=True, cmap='coolwarm') plt.show()
Your Turn: Run the cell. Look at the heatmap on the left. Find the row for 'churn'. Which feature has the strongest negative correlation? That is your biggest churn driver!
ecommerce_churn.csv
1000 rows × 4 columns| customer_id | tenure_months | total_spend | churn |
|---|---|---|---|
| usr_892 | 12 | 450.50 | 0 |
| usr_104 | 2 | 45.00 | 1 |
| usr_443 | 36 | 2100.00 | 0 |
| usr_991 | 1 | 12.99 | 1 |
| usr_202 | 24 | 1250.75 | 0 |
| usr_331 | 48 | 3400.20 | 0 |
| usr_705 | 3 | 110.00 | 1 |
df.describe() in Python to see statistical summaries.Knowledge Check
Ready to test your understanding of Build With Me: The EDA Detective?