Build With Me: The Feature Forge
Cleaning the Mess
Models hate text. Models crash on missing data. Models get confused by wildly different scales. In this build, we are going to write a professional transformation pipeline. Watch the Feature Forge on the left to see exactly how your code mutates the raw data into model-ready features.
Step 1: Handling the Voids
Look at the Raw Data tab on the left. We have missing ages (NaN). If we feed this to a model, it will crash. Let's fix it by imputing the mean.
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
df = pd.read_csv('raw_customers.csv')
# Impute missing values with the column average
df['Age'] = df['Age'].fillna(df['Age'].mean().round())Your Turn: Hit Run Pipeline. Watch the red NaN values turn green and populate with numbers!
Step 2: One-Hot Encoding
Now look at the 'Tier' column. It says 'Pro', 'Free', and 'Max'. A math equation can't multiply the word 'Pro'. We need to split this into binary columns (0s and 1s).
# Convert categorical text into binary columns df = pd.get_dummies(df, columns=['Tier'])
Your Turn: Hit Run. Watch how the single 'Tier' column explodes into multiple columns where text becomes machine-readable numbers.
Step 3: Feature Scaling
Finally, look at Income vs Score. Income is in the hundreds of thousands, while Score is 1 to 10. The model will think Income is 10,000x more important just because the numbers are bigger. We must scale it.
# Scale Income to a 0.0 - 1.0 range
scaler = MinMaxScaler()
df[['Income']] = scaler.fit_transform(df[['Income']])
print("Pipeline complete. Data is ready for modeling.")Your Turn: Hit Run. The massive income numbers are now compressed into neat decimals. You just built a production ETL pipeline!
| ID | Age | Tier | Income | Score |
|---|---|---|---|---|
| 101 | 28 | Pro | $85,000 | 7.2 |
| 102 | NaN | Free | $42,000 | 4.1 |
| 103 | 45 | Max | $150,000 | 9.8 |
| 104 | NaN | Pro | $92,000 | 8 |
| 105 | 22 | Free | $35,000 | 5.5 |
Knowledge Check
Ready to test your understanding of Build With Me: The Feature Forge?