15. AI Projects: Hands-On Portfolio
Hands-On: Build Real Systems
Theory without practice is incomplete. This module walks through six production-grade AI projects—each introducing a complete end-to-end pipeline from data to deployment. These are not toy examples. They use real techniques, real tools, and real architectures. Completing all six gives you a portfolio demonstrating practical AI engineering capability.
📧 Project 1: Spam Email Classifier
Objective: Build an email classifier that distinguishes spam from legitimate email with high precision.
Dataset: Enron Email Dataset (Kaggle) or SpamAssassin Public Corpus. ~33,000 labeled emails.
Pipeline:
- Data Loading and EDA: Load raw emails, compute class distribution (typically 30% spam), visualize word frequency distributions for spam vs ham.
- Feature Engineering: TF-IDF vectorization of email body, subject line features (all caps, exclamation marks), sender domain analysis, URL count, and email header features (Reply-To mismatch).
- Baseline Model: Naive Bayes (historically very effective for spam). Evaluate with precision/recall—optimize for high precision to avoid false positives (legitimate email classified as spam).
- Advanced Models: Logistic Regression, Random Forest, LightGBM with hyperparameter tuning via cross-validation.
- BERT Fine-tuning: Fine-tune DistilBERT on the labeled emails for state-of-the-art performance with manageable compute.
- Deployment: Wrap in a FastAPI endpoint. Input: raw email text. Output: {classification, confidence, key_indicators}.
Key Learnings: TF-IDF for text features, class imbalance handling, precision vs. recall trade-off in practice, transformer fine-tuning, and API deployment.
🖼️ Project 2: Image Classification App
Objective: Build a custom image classifier for a domain of your choice (plant disease, food recognition, retail product classification) and deploy as a web app.
Pipeline:
- Data Collection: Scrape images using the Bing or Google Image Search API, or use an existing dataset (Plants Disease dataset on Kaggle).
- Data Augmentation Pipeline: Implement a comprehensive augmentation strategy using torchvision.transforms and albumentations library.
- Transfer Learning: Start with EfficientNet-B0 or ResNet50 pretrained on ImageNet. Freeze backbone, train classifier head for 10 epochs, then fine-tune top layers with a low learning rate for 20 more epochs.
- Evaluation: Confusion matrix, per-class precision and recall. Identify which classes are hardest to distinguish.
- Deployment as Web App: Gradio for rapid UI prototyping. Hugging Face Spaces for free hosting. Users drag-and-drop an image and receive a prediction with confidence breakdown.
🎯 Project 3: Recommendation System
Objective: Build a collaborative filtering recommendation system for movies or e-commerce products.
- Dataset: MovieLens 1M (1M ratings from 6,000 users across 4,000 movies). Well-labeled, clean, industry-standard benchmark.
- Collaborative Filtering: Matrix factorization using SVD (scikit-learn) or Neural Collaborative Filtering (PyTorch). Learn latent user and item embeddings that capture preferences.
- Content-Based Filtering: Item embeddings from genre, cast, and plot summary (TF-IDF or BERT embeddings). Recommend similar items to what the user has liked.
- Hybrid System: Combine collaborative and content-based signals. Improves cold-start handling (new items or users with no rating history).
- Evaluation: RMSE for rating prediction. Precision@K and Recall@K for ranking quality. NDCG (Normalized Discounted Cumulative Gain) for position-aware ranking evaluation.
🚨 Project 4: Fraud Detection System
Objective: Build a production-grade credit card fraud detection system handling extreme class imbalance.
- Dataset: Kaggle Credit Card Fraud dataset. 284,807 transactions, 492 fraudulent (0.17% fraud rate).
- Imbalance Strategy: SMOTE (Synthetic Minority Over-sampling Technique) for synthetic minority generation. Class weight adjustment. Evaluate with precision-recall curve, not accuracy.
- Feature Engineering: Transaction velocity features (transactions in last 1h, 24h), amount deviation from user's historical mean, time-of-day features, merchant category risk scores.
- Model Ensemble: LightGBM + Random Forest + Logistic Regression ensemble with calibrated probability outputs. Threshold optimization: find the threshold minimizing total cost (false positive cost + false negative cost, weighted by business impact).
- Real-time Serving: FastAPI endpoint returning {fraud_probability, decision, explanation} in under 50ms. SHAP values for per-transaction explanations required by regulations.
🤖 Project 5: LLM-Powered Autonomous Research Agent
Objective: Build an agent that can autonomously research a topic, synthesize findings, and produce a structured report.
- Tool Implementation: Web search (Tavily or SerpAPI), URL content extraction (BeautifulSoup), Python code execution for data analysis, chart generation.
- ReAct Agent Loop: Implement the multi-step agent loop using the Anthropic or OpenAI API with tool use. Add iteration limits and error handling.
- Report Generation: Agent produces a structured Markdown report with executive summary, findings, data visualizations, and citations.
- Quality Evaluation: Build a separate LLM evaluator that scores the agent's output on accuracy, completeness, citation quality, and structure.
💬 Project 6: RAG Enterprise Chatbot
Objective: Build a "chat with your documents" system for a collection of technical documentation or PDF reports.
- Document Processing Pipeline: PDF parsing with pypdf2, metadata extraction, chunking strategy (recursive character splitting with 200-token overlap).
- Vector Database: ChromaDB (local) or Pinecone (cloud). Implement batch upsert for large document collections.
- Query Pipeline: Query rewriting (LLM expands ambiguous queries), hybrid search (dense vector + sparse BM25), cross-encoder reranking for final top-K selection.
- Conversational Memory: Maintain conversation history. Use sliding window summarization when history exceeds context window.
- Evaluation: RAGAS framework for RAG-specific metrics: faithfulness (answers grounded in retrieved context), answer relevancy, context precision and recall.
Knowledge Check
Ready to test your understanding of 15. AI Projects: Hands-On Portfolio?