Data StructuresPython Sets
A set is an unordered collection of unique items. Sets automatically eliminate duplicates — if you add the same value twice, it only appears once. This makes sets the perfect tool for deduplication, membership testing, and mathematical set operations.
Sets are mutable (you can add and remove items), but they are unordered (no indexing, no slicing). The items inside a set must be immutable — strings, numbers, and tuples are allowed; lists and dictionaries are not.
Creating a Set
Sets are created using curly braces with items separated by commas — identical syntax to dictionaries but without key-value pairs. To create an empty set, you must use set() — using {} alone creates an empty dictionary, not a set.
fruits = {"apple", "banana", "cherry", "apple", "banana"}
print(fruits) # Duplicates are automatically removed
print(type(fruits))
# Creating from a list (great for deduplication)
raw_ids = [101, 202, 101, 303, 202, 404]
unique_ids = set(raw_ids)
print(unique_ids)
# Empty set — must use set(), NOT {}
empty = set()
print(type(empty))
Output{'banana', 'apple', 'cherry'}
<class 'set'>
{101, 202, 303, 404}
<class 'set'>
Adding and Removing Items
Use .add() to add a single item and .update() to add multiple items at once. For removal, .remove() raises an error if the item is missing; .discard() does not.
skills = {"Python", "SQL", "Git"}
skills.add("Docker")
print(skills)
skills.update(["AWS", "Linux"])
print(skills)
skills.remove("SQL") # Raises KeyError if not found
print(skills)
skills.discard("Kubernetes") # Safe — no error if not found
print(skills)
Output{'Python', 'SQL', 'Git', 'Docker'}
{'Python', 'SQL', 'Git', 'Docker', 'AWS', 'Linux'}
{'Python', 'Git', 'Docker', 'AWS', 'Linux'}
{'Python', 'Git', 'Docker', 'AWS', 'Linux'}
Set Union
The union of two sets contains all items from both sets combined, with no duplicates. Use the | operator or the .union() method.
backend = {"Python", "Django", "PostgreSQL"}
frontend = {"JavaScript", "React", "Python"}
combined = backend | frontend
print(combined)
# Same result using method
combined2 = backend.union(frontend)
print(combined2)
Output{'Python', 'Django', 'PostgreSQL', 'JavaScript', 'React'}
{'Python', 'Django', 'PostgreSQL', 'JavaScript', 'React'}
Set Intersection
The intersection of two sets contains only items that appear in both sets. Use the & operator or the .intersection() method. This is useful for finding common elements.
team_a = {"Alice", "Bob", "Charlie", "Diana"}
team_b = {"Bob", "Diana", "Eve", "Frank"}
both_teams = team_a & team_b
print(both_teams)
# Same using method
print(team_a.intersection(team_b))
Output{'Bob', 'Diana'}
{'Bob', 'Diana'}
Set Difference
The difference of two sets contains items that are in the first set but not in the second. Use the - operator or the .difference() method.
all_students = {"Alice", "Bob", "Charlie", "Diana", "Eve"}
present_today = {"Alice", "Charlie", "Eve"}
absent = all_students - present_today
print(f"Absent students: {absent}")
OutputAbsent students: {'Bob', 'Diana'}
Set Symmetric Difference
The symmetric difference contains items that are in either set, but not in both. Use the ^ operator or the .symmetric_difference() method.
subscribed_email = {"alice", "bob", "charlie"}
subscribed_sms = {"bob", "diana", "eve"}
# Users who subscribed to EXACTLY ONE channel
one_channel_only = subscribed_email ^ subscribed_sms
print(one_channel_only)
Output{'alice', 'charlie', 'diana', 'eve'}
Membership Testing
Sets are extremely efficient for membership testing using the in keyword. Checking membership in a set is O(1) — it takes the same time regardless of how large the set is. Checking membership in a list is O(n) — it gets slower as the list grows.
banned_users = {"spammer99", "bot_account", "hacker007"}
username = "bot_account"
if username in banned_users:
print(f"Access denied: {username} is banned.")
else:
print(f"Welcome, {username}!")
print("hacker007" in banned_users)
print("alice" in banned_users)
OutputAccess denied: bot_account is banned.
True
False
💡Performance Tip: When you need to check membership thousands of times in a loop, always convert your list to a set first. The lookup speed difference is dramatic — checking a million items in a set takes roughly the same time as checking a list of 10 items.