class: center, middle, inverse, title-slide # Regex ## A DVS Workshop
http://library.duke.edu/data/news
### John Little ### 2017-01-31 --- exclude: true class: center, middle background-image: url(https://foo.edu/img/apis.png) --- ## Brief "History" - 1950s: American mathematician Stephen Cole Kleene - Came into common use with Unix text-processing - Consists of different syntaxes (POSIX, Perl) --- ## Implementations - Search engines, word processors, text editors - AWK, grep (UNIX command line) - Textpad, Notepad++ - Google Sheets - MS Word - OpenRefine - Programming languages: often built, sometimes via libraries --- ## Patterns - Syntax for representing a pattern - Each characters in a regex is either a **metacharacter** (special meaning) -- - Or, a **regular character** (literal meaning) -- > Duke. - The wildcard '.' matches every character except a newline -- - Matches: "Duke ", "Dukes", "Duke1", "Duke0", ... - Note the space following "Duke" - Not "duke" or "dukes" -- - You can "escape" a metacharacter to enforce a literal match > \\. --- ## Matches - Range from precise to general - General: `[a-z]` matches all letters from 'a' to 'z' -- - More general: `.` -- - More precise: `j` -- - Common regex used to locate same word spelled two different ways - match "color" and "colour" -- - `/colou?r/` -- - match "color" and "Color" - `/[Cc]olor/` --- ## Cheatsheets - [For this workshop](http://libjohn.github.io/regex/cheat-sheet.html) - [Proper](https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/) --- ## Find and Replace - Most people have used regex and didn't know it - Exist in Word Processors as "find & replace" - Sometimes it's used to find: match a pattern - Sometimes it's used to replace: substitute --- ## Uses - Can get very sophisticated, matching for complex substitutions - For Example capture (find) - \#hastags (words that begin with "#") - all email addresses - variant spellings - variant capitalizations - variant puntuations - But ... -- ## Regex *is not* Machine Learning - *You* specify the pattern - This is sometimes challenging --- class: center, middle background-image: url(http://imgs.xkcd.com/comics/regular_expressions.png) .left[[XKCD](https://www.xkcd.com/208/)] .right[[Next](../)] --- ## Attribution This slide deck and the handouts and exercises were strongly influenced by - [Wikipedia entry](https://en.wikipedia.org/wiki/Regular_expression) - [Intersect](http://www.intersect.org.au/course-resources) in Australia ## Shareable under CC BY-NC-SA license Data, presentation, and handouts are shareable under [CC BY-NC-SA license](https://creativecommons.org/licenses/by-nc-sa/4.0/) ![This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.](https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png "This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License")