Table of Contents
This chapter covers some examples of the regular expression using re
module. If you're not reviewed
regular expressions and regex special characters, please review them first.
or you can look at the official documentation from https://docs.python.org/3/library/re.html.
The re
module contains several functions that helps finding out, compiling and matching regular expressions. Some of the functions are as follows:
Syntax: re.compile(pattern, flags=0)
The re.compile()
function compiles the regular expression from the regular string to a regular expression object. It takes 2 arguments:
pattern
: a regular expression as a stringflags
: a value that can be specified to modify default behavior. defaults to 0 Some example of flags are as follows:re.A
orre.ASCII
: Matches only ASCII matching instead of unicode matching.re.I
orre.IGNORECASE
: Expressions like[A-Z]
also matches lowercase when this flag is specifiedre.M
orre.MULTILINE
: The pattern character^
also matches the beginning of each line.re.X
orre.VERBOSE
: This flag allows us to write regex in a more readable wayre.DEBUG
: Displays debug information
- The function returns a
re.Pattern
object which can then be used to match, search, or find all occurrences with that pattern.
Example 1: A regular expression that extracts the name of a country that starts with A and ends with a
import re
countries = ['USA', 'Japan', 'Angola', 'China', 'Algeria', 'Nepal', 'Argentina', 'Albania']
pattern = re.compile(r'^A\w*a$')
The above pattern when matched with the list of countries should give matches for Angola, Algeria, Argentina and Albania
countries = ['USA', 'Japan', 'Angola', 'China', 'Algeria', 'Nepal', 'Argentina', 'Albania']
# a word that starts with A and ends with a
pattern = re.compile(r'^A\w*a$')
for country in countries:
print(pattern.match(country))
"""
# Output
None
None
<re.Match object; span=(0, 6), match='Angola'>
None
<re.Match object; span=(0, 7), match='Algeria'>
None
<re.Match object; span=(0, 9), match='Argentina'>
<re.Match object; span=(0, 7), match='Albania'>
"""
Example 2: compile a regular expression that can match hexadecimal strings:
import re
strings = ["0xaa", 'FA04', 'Ak45','As40', '0x5h', '0x56']
pat = re.compile(r'^(0x)?[0-9A-Fa-f]+$')
for st in strings:
print(pat.match(st))
"""
# output
<re.Match object; span=(0, 4), match='0xaa'>
<re.Match object; span=(0, 4), match='FA04'>
None
None
None
<re.Match object; span=(0, 4), match='0x56'>
"""
Explanation: Here the pattern looks for optional 0x
character which can be represented to display hexadecimal digit. and the pattern should contain at least 1 digit from 0-9 or a-f in any case.
Syntax: re.search(pattern, string, flags=0)
The re.search()
function starts scanning the string for the first match of the regex and returns a corresponding match object.
It returns None
if there is no match throughout the string.
Note: It does not return multiple match objects if the string contains multiple matches.
Example 3:search for the pattern which contains mail or email.
import re
text = "Dear sir, I've emailed you last week regarding the assignment. I've also sent you another mail specifying the next assignment."
results = re.search(r'e?mail(ed)?', text)
print(results)
"""
# Output
<re.Match object; span=(15, 22), match='emailed'>
"""
Syntax: re.match(pattern, string, flags=0)
If zero or more characters at the beginning of the string match the regular expression pattern, it returns a corresponding match object.
It returnsNone
if there is no match for the regex. The re.match()
function takes 3 parameters which are as follows:
- pattern: the regex pattern
- string: the string in which we try to find the match
- flag: the flag to change the default behavior of the regex. (similar to
re.compile()
function)
Example 4: Match if the first word contains a letter e
in the second position.
import re
text = 'Dear sir'
print(re.match(r'[a-z]e\w+', text, re.I))
"""
# output
<re.Match object; span=(0, 4), match='Dear'>
"""
Note: It does not return any matches if it can't match at the beginning of the string. If we want to locate a match anywhere in the string then we can use the re.search()
function instead.
Note if we want to match the whole string with the regex, then we can use re.fullmatch()
function instead.
Syntax: re.findall(pattern, string, flags=0)
The re.findall()
function returns all non-overlapping matches of pattern in the string as a list of strings. Matches are scanned rom left to right and are returned in order. If there are no matches, it returns an empty list.
Example 5: list out all countries that starts with A and ends with a from a string separated by comma
countries = 'USA, Japan, Angola, China, Algeria, Nepal, Argentina, Albania'
print(re.findall(r'A[a-z]+a', countries))
"""
# Output
['Angola', 'Algeria', 'Argentina', 'Albania']
"""
Syntax: re.split(pattern, string, maxsplit=0, flags=0)
The re.split()
function splits the string by the occurrences of pattern. It is similar to string.split()
method in behavior but with the help of regular expressions, we can even split strings in a powerful way with less code. we can also pass maxsplit to specify maximum number of splits that can be executed by the function if the string is longer one.
Example 6: separate all the name of countries separated by any special characters
import re
countries = 'USA, Japan; Angola! China, Algeria% Nepal; Argentina/ Albania'
print(re.split(r'\W+', countries))
"""
# Output:
['USA', ' Japan', ' Angola', ' China', ' Algeria', ' Nepal', ' Argentina', ' Albania']
"""
Syntax: re.sub(pattern, repl, string, count=0, flags=0)
Example 7: replace all the random separators by comma from the countries from the above string:
import re
countries = 'USA, Japan; Angola! China, Algeria% Nepal; Argentina/ Albania'
print(re.sub(r'\W+', ', ', countries))
"""
# Output
USA, Japan, Angola, China, Algeria, Nepal, Argentina, Albania
"""