Mastering Python’s Regex – part 1 : Basics

Mastering Python’s Regex – part 1 : Basics

Python regular expressions is often an overlooked topic in the python apprentice path due to it’s seemingly mingled nature. If programming was new enough, you don’t need those bizarre symbols to complicate your life further. However, if you understand it, you can use it. Regex is easy … if you get the right tutorial !

What are Regular Expressions in the first place?

Regular expressions is a language used to define a search pattern. Beyond searching, regular expressions has deep links with Compiler Theory (The study of building programming languages). It is used to define formal languages (computer languages can be described by formal languages, as opposed to natural speech). There are notations to describe formal languages such as BNF (Backus-Naur form), but they can also be described using … regular expressions. If you did not understand this much, move on, it does not matter, one day it’ll click in.

Other names for Regular Expressions

regex, regexp

The need for functional demos

Completely theoretical explanations serve it’s purpose, but, the delight of the enlightened member is confusion, darkness and apprehension for the novice. Basic workable examples illustrate a basic block. The learner can then assemble his bricks to build walls, houses, towers and forts.

The starting point

The first thing is to import the regex module

then we search for a word in a phrase, something so simple that we would not have needed the regex module at all, something that could have been achieved using python’s in

if we print it, we’ll get <_sre.SRE_Match object; span=(30, 34), match=’road’>

but if we check it’s boolean value

we get

as road is in the sentence as opposed to :

What can we do with this much?
we can check if a word is in a sentence by

as a side note, if lookup1 is same as if lookup1 == True

The sequence

you’ll do three things :

  1. declare your string as a raw string literal
  2. compile
  3. match

What are raw string literals?

a normal string looks like that

but a raw string literal looks like that

prefixing the r allows characters to remain as they are

example:

where \n was treated as a characted telling to put what come next on a new line but

the \n was taken litterally as it is

let us see backslashes more carefully

but

as expected, trying with

The use of raw string literals in regular expressions

it simply saves you lots of escaping

see

and

would you rather type 2 or 4 slashes? regular expressions juggles with enough symbols for us to overload some \\

Wetting your feet : the three steps and the * operator

outputs

before we continue, let us list the rules

Rule 1: characters are interpreted as they are when not near symbols

that explains why we could match road in our string in the previous example

Rule 2: * tells to match 0 or more times

so aa* means match a and then see if there is another a zero or more

in the next post we’ll dive in more

  •  
  •  
  •  
  •  
  •  
  •  

Lives in Mauritius, cruising python waters for now.