4 Hands-on: dates

Dates are weird!11 Or, more accurately, Time is philosophically fascinating and sometimes difficult to represent. This is true for your brain and it’s especially true for your computer. Keep in mind that converting dates (i.e. using the toDate() function) format sometimes results in errors because time and date formats are weird.

Often, in OpenRefine, it’s easiest to operate on strings and convert to dates (or to numbers) as needed. For example, sometimes it’s easier to use a slice() of a string and then convert to date, or to number if you really need to do so.

However, in OpenRefine, the date format is necessary for date faceting and for finding the difference between two dates (e.g. using the diff() function, or for getting a datePart() for retrieving an hour, or second).

This project will give you practice manipulating dates

4.1 Dates

  1. Open the existing Refine Project Raleigh Building Permits, (from Project 1)
  2. Make a copy of the issue_date column
    • issue_date > Edit column > Add column based on this column…
      • new column name = date2 > OK
  3. Create a customized facet on AM/PM
    • issue_date > Facet > Custom text facet …
      • Expression = value.slice(20,22)
  4. Create a customized facet on the hour
    • issue_date > Facet > Custom text facet…
      • Expression = value.slice(10,13)
  5. Convert to Date
    • date2 > Edit cells > Common transforms > To Date

    The status message indicates 19598 rows were transformed, but there are 21982 rows of data, what went wrong?

    • Select the “PM” custom text facet and the “12” custom text facet
      • Do you see that none of the dates converted when the time included 12PM?
      • For whatever reason, the date converter doesn’t like timestamps that contain 12PM as a designation
      • So let’s convert those dates from 12PM to 00PM (Why? because it fixes the problem)
  6. Convert 12PM to 00PM
    • date2 Edit cells > transform
      • Expression = value.replace(" 12:"," 00:")
  7. Now convert to date
    • date2 > Edit cells > Common transforms > To Date

    That converts 2384 cells

  8. Now, Remove All facets

  9. Now make a date facet
    • date2 > Facet > Timeline facet

4.2 Logic

This time we’ll put together everything we’ve learned (how to use OpenRefine, regular expressions, GREL) and add in some logic control statements (i.e. an IF statment.) Our goal is to create the same timeline facet but with fewer, more complex OpenRefine steps. You will see in this example that while OpenRefine is a powerful data transformation tool which can easily support casual attention to syntax, file handling, and looping; OpenRefine is also a powerful and extensible tool which supports scripting and begins to look like real programming. In other words, it’s an extensible data transformation tool.

Goal: Make the same data transformations as in the previous slide’s steps 2-7, but this time you’ll do it in one step. Think about how GREL allows you to take shortcuts.

  1. issue_date > Edit Column > Add column based on this column…
    • New Column Name = date3
    • Expression = if(value.slice(20,22) == "PM", value.replace(/(.*\s)12(:\d\d:.*)/,"$100$2"), value).toDate()
  2. Now make a date facet
    • date3 > Facet > Timeline facet

  1. Dates are Difficult by Mike James (www.i-programmer.info)