regex

Regular Expressions

…early nightmares of climate crisis…
…disagree on policy, but climate change is real…
…our planet from climate change, and ice…
…understand that climate change is an existential…
…know it well-paid climate deniers are invited…
…creeps in, allowing climate deniers to be…
…goal is to treat Climate Change like the…
…isn’t only about climate change - it’s…
…housing, jobs, and climate all without…
…delay real action on climate change, the more…
…they helped create: climate change, housing…

…early nightmares of climate crisis
…disagree on policy, but climate change is real…
…our planet from climate change, and ice…
…understand that climate change is an existential…
…know it well-paid climate deniers are invited…
…creeps in, allowing climate deniers to be…
…goal is to treat Climate Change like the…
…isn’t only about climate change - it’s…
…housing, jobs, and climate all without…
…delay real action on climate change, the more…
…they helped create: climate change, housing…

wdloh@umd.edu



\A[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@ (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Jamie Zawinski

import re

s = "To be or not to be."

match = re.search("Be", s)

if match:
    print("match!")
else:
    print("no match :(")

no match :(

import re

s = "To be or not to be."

match = re.search("Be", s, re.IGNORECASE)

if match:
    print("match!")
else:
    print("no match :(")

match!

\w word character

import re

s = "To be or not to be."

match = re.search("\w\w\w", s)

print(match.group())

not

\d digit/number

+ one or more

import re

s = "32 Penn-Lyle Road, Princeton Jct, 08550"

match = re.search("\d+", s)

print(match.group())

32

$ end of string

import re

s = "32 Penn-Lyle Road, Princeton Jct, 08550"

match = re.search("\d+$", s)

print(match.group())

08550

findall()

import re

s = "32 Penn-Lyle Road, Princeton Jct, 08550"

for s in re.findall("\d+", s):
    print(s)

32
08550

Remember this JSON dataset? Let’s imagine we wanted to find all the words that follow “climate” in AOC’s tweets. How could we do that?

Grouping

import re
import json

fh = open('aoc.json')
tweets = json.load(fh)

for tweet in tweets:
    m = re.search('climate (\w+)', tweet['text'])
    if m:
        print(m.group(1))

For the next class take a look at the exercise that we’ll be working on together.