18. Python RegEx
Python RegEx (Regular Expressions)
A Regular Expression (RegEx) is a sequence of characters that forms a search pattern. It is an extremely powerful tool used for searching, matching, and manipulating text strings based on specific patterns, rather than exact words.
Python has a built-in package called re, which can be used to work with Regular Expressions. You simply import it to start parsing strings for complex patterns like emails, phone numbers, or specific word boundaries.
Importing the re Module
Before you can use regular expressions, you must import the re module. Let's do a quick test to check if a string starts with "The" and ends with "Spain".
The findall() Function
The findall() function returns a list containing all matches. If no matches are found, it returns an empty list. It searches the entire string and extracts every occurrence of the pattern.
The search() Function
The search() function searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned.
The split() Function
The split() function returns a list where the string has been split at each match. This is far more powerful than the standard string .split() method because you can split on complex patterns.
The sub() Function
The sub() function replaces the matches with the text of your choice. It is short for "substitute".
Match Object Methods
A Match Object is an object containing information about the search and the result. It has properties and methods to extract details about the match.
- .span() returns a tuple containing the start and end positions of the match.
- .string returns the original string passed into the function.
- .group() returns the exact part of the string where there was a match.
Common Metacharacters
Metacharacters are characters with a special meaning in RegEx.
- [] - A set of characters (e.g. "[a-m]")
- \ - Signals a special sequence or escapes special characters (e.g. "\d" for digits)
- . - Any character except newline (e.g. "he..o")
- ^ - Starts with (e.g. "^hello")
- $ - Ends with (e.g. "world$")
- * - Zero or more occurrences
- + - One or more occurrences
Knowledge Check
Ready to test your understanding of 18. Python RegEx?