A workshop and workbook
avian
– 2 matches Avian
– 12 matchesavian
– with the insensitive case flag – 14 matches [A-Z]\w*
[A-Z]+
– match only “all caps” words. BUT this is not quite right. It doesn’t work. Do you know why?\b[A-Z]+\b
– match on a word boundary using an anchor class: \b
\b[A-Z]{2,}\b
– Abbreviations are usually 2 or more upper case characters.Note
"
or [
[
]
are denoted by square brackets.
*
, +
, ?
{
}
allow for define repetition\b
is an anchor denoting a word boundaryMatch the last words of sentences
\w+.
– This doesn’t work because “.” matches every character\w+\.
– We escape the period “.” with a the escape character \
\w+\.\s
– More precise this time. Matching on 56 words. Using \s
allows us to stop matching email address by matching whitespace \s
Note
\w
is a “word”" character\s
is a “space” character.
is a meta character (introduced above)Find all years
\d\d\d\d
– a lot of matches here\d{4}
– more succinct but has the same meaning as above\b\d{4}\b
– word boundaries \b
help but there are still some false positives\b(19|20)\d\d\b
– better and works for the twenties and twenty-first centuriesNote
|
for alternation, alternatives(
)
grouping{
}
multiplierFind a phone number
\(\d{3}\) \d{3}-\d{4}
– Very specific. This works as long as phone numbers are formatted consistently\(?\d+\)? ?[\d-]{5,}\d
– more permissive\(?\d+(\)|.)? ?[\d-.]{5,}\d
– more permissive still. Allows for .
instead of - as a separatorNote
\(
?
indicates optionality matching zero or one occurrence\w+@[\w\.]+
Note
^(\w+ ?)+$
– match repeating words + optional spaceNote
+
can be applied to a group^
prior to a match pattern means begins with$
following a match patterns means ends withhonour
– 14 matcheshonou?r
– optional “u” and still 14 matcheshon(our|ourable|esty?)
– honour honourable, honest, honesty; for 66 matches^(ACT|SCENE) [IVXLCDM]+
– literal word, space, roman numerals; for 20 matches^[A-Z]+$
^.*\?
– from start of line to question mark$0
code.(\w+) (\w+)
– in the Expression panel will highlight all names"$0"
– in the Substitution pane will reproduce the text pattern matched within forward slashes (Expression pane /
/
)- $2, $1
– swap the order of the first and last name and precede the whole name with a dash ‘-’<b>$2</b>, $1
– Bold the last name and add a comaPlease note this is actual twitter stream data about a politician, the tweets may be offensive
You can do this exercise in Regexr.com or copy the textbox data and paste to RegEx101.com
Please complete the paper Feedback Form
Presenter