2 Hands-on: GREL

The goal of this project is to create custom facets and perform basic transformations, introduce you to GREL – the General Refine Expression Language – and develop practical skills in transforming and normalizing data. You’ve used GREL several times getting to this point in this workbook, but lets look a little deeper.

In this example you will open a new data set to practice importing a different type of data, this time comma separated value, or CSV. Last time you imported a tab separated value, or TSV. OpenRefine will important many types of data. Being aware of different data formats, and having the ability to open those data are valuable skills. In fact, one of the most frequent stumbling blocks when using new data tools is the simple problem of importing the data. Practice and learn…

2.2 GREL to Transform and Normalize

The General Refine Expression Language (GREL) is a powerful and extensible language to manipulate data. In these next steps we will learn GREL by using practical steps to improve the structure of the data.

Split the LOCATION Column into two columns (Latitude and Longitude)
- LOCATION > Edit column > Split into several columns… > OK (i.e. Accept the defaults)
- Rename Columns
  - Location 1 > Edit column > Rename this column > Latitude
  - Location 2 > Edit column > Rename this column > Longitude
- Remove parenthesis and trim whitespace
  - Latitude > Edit cells > Transform …
    1. Expression = value.replace("(","") > OK
    2. Latitude > Edit cells > Common transformations > Trim leading and trailing whitespace
  - Longitude > Edit cells > Transform …
    1. Expression = value.replace(")","") > OK
    2. Longitude > Edit cells > Common transformations > Trim leading and trailing whitespace

Regular Expression

Regular Expressions are very powerful and flexible codes used for matching patterns. Often there is more than one way to compose a regex pattern-match. Importantly for OpenRefine, much of Refine’s extensible and advanced power comes from regular expressions. Essentially the key to advanced level OpenRefine is regular expressions and looping. To learn more about regular expressions see my handout on regex. For now you can refer to these few commands and symbols summed up on this quick-sheet.

Create New Column from Existing Column using a regular expression to match a pattern
- INC DATETIME > Edit column > Add column based on this column …
  1. New column name = YEAR
  2. Expression = value.match(/.*\/(\d\d\d\d).*/)[0] > OK⁸

Filter Text

Create a Text Filter: Explore your data to find how many incident descriptions involve bicycles
1. LCR DESC >Text filter
2. In the filter box, enter the term bicycle

Facet

Make a Text Facet on the LCR DESC column

LCR DESC > Facet > Text facet

How many Bicycle categories exist in the facet window?

6

Which facet is most often used “LCR DESC” facet?

LARCENY/BICYCLES ($50-$199) - 119

Cluster

Look for Cluster variations in the FELONY Column
- Clustering refers to the operation of finding groups of different values that might be alternative representations of the same thing. For example “Gödel” and “Godel”. This is a handy way to find spelling variations.
- FELONY > Facet > Text facet
- FELONY > Edit cells > Cluster and edit …
  - Method = “key collision” ; Key Function = “metaphone3”
- For each row, check “Merge?” and change the “New Cell Value”
  1. to Felony in the first row
  2. to Not Felony in the second row
  3. click the “Merge Selected & Re-Cluster” button
  4. Try other “Methods” and “Keying Functions”. “Merge Selected & Re-Cluster” after each operation
  5. Close

Mass Editing via Facets

Mass Edit cells in LCR DESC Facet: LCR DESC > Facet > Text facet
- Mouseover & Edit “LARCENY/BICYCLE/FELONY/$200- …”
- delete “/FELONY”
- Add an “S” to ‘BICYCLE’ > Apply
Export as an Excel File
- Export > Excel

Cleaning Data with OpenRefine