75 MINS intermediate
Build With Me: News Scraper & Digest
Build With Me 03
The Daily Briefing Pipeline
Today we are building an end-to-end data pipeline. We will scrape top headlines from HackerNews, summarize them using an LLM API, and automatically dispatch a formatted HTML email digest to our subscribers.
Step 1: Extract & Transform
import requests
from bs4 import BeautifulSoup
def scrape_headlines():
print("Fetching headlines...")
url = "https://news.ycombinator.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
articles = []
# Find the title rows
for item in soup.find_all('tr', class_='athing')[:5]: # Get top 5
title_element = item.find('span', class_='titleline').find('a')
title = title_element.text
link = title_element['href']
articles.append({'title': title, 'link': link})
return articlesStep 2: Generate the Digest & Load (Email)
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
def send_digest(articles):
print("Formatting digest...")
html_content = "Your Daily Tech Briefing
- "
for article in articles:
html_content += f"
- {article['title']} " html_content += "
Knowledge Check
Ready to test your understanding of Build With Me: News Scraper & Digest?