Regular Expression is a very powerful feature to handle string operations including matching, modifying, splitting and so on. The re module in python contains many useful functions for regular expression. Here is a top level summary and details can be found here: https://realpython.com/regex-python/

Overview

Regular expression is a powerful feature to search and manipulate string. The most popular python module is ‘re’. We are at first talked about some important concepts. A quick tour among regex string, regex object and match object is:

# Target String
s = 'this is the content string'
# Regex Pattern
regex = '123'

# regexp as the input for match
match_object = re.search(regex, s)

# Compile a regex object
regex_object = re.compile(regex)
# use reg_ogject for match
match_object = regex_object.search(s)

# get regexp object from match result
reg_object = match_object.re

Pattern

Pattern is just a string used as an argument of re functions and can be interpreted as a pattern to match in its target string. It contains many special syntax that we will talk about below

Regex Object

Regex object is the object constructed implicitly or explicitly from the exgexp string by re module, supporting string search and manipulation.

Getting regex object:

  1. re.comiple(regex)
  2. match_object.re

The regex object has a set of attributes about the regexp config and a set of methods doing matching and manipulation.

Attributes: Everything about the regex, the pattern, groups, flags

Methods: Every methods including matching and substitution

Match Object

Match Object is the result of ‘re’ matching methods. Each object contains a set of attributes to access the matching pattern and methods for the result.

Attributes:

  1. Search pattern: the position of matching substring, regex object, regex pattern
  2. Search result: last group index and last group name

Methods: All about matching results, groups, positions, backreference

Regular Expression Pattern

Metacharacters