5 Hands-on: Reconciliation

OpenRefine’s Reconciliation service is used to semi-automate the process of matching data in OpenRefine fields with more authoritative data in external sources. This reconciliation function is called semi-automated because the end-user is given the opportunity to interactively approve or select which data are modified by choosing from a pick-list of results. This process can be used to improve and standardize individual data fields or columns of data inside of OpenRefine. Or it can be used to extend the data in OpenRefine.

For example: Match each name in a list of authors against an external authoritative list of authors. Augment the authoritative entry of each author’s name; Add an example book title for each author.

5.1 Reconciliation

Goal: Import new author data into a new project, Use the WikiData and VIAF reconciliation services to gather authoritative versions of African American author names and an example of each author’s work, plus their placed of birth and death.

  1. Make a New Project, Import Author Data

    • Create Project > Web Addresses (URLs) > https://raw.githubusercontent.com/libjohn/openrefine/master/data/AA-authors-you-should-read.csv
    • Next >>
    • You many want to give your project a pretty title
    • Create Project >>
    • Change Show: to 25 to see all 11 records.
  2. author > Reconcile > Start reconciling…

    1. First time / One time …

      1. Add Standard Service…
      2. In the Enter the service’s URL: textbox, enter: http://refine.codefork.com/reconcile/viaf12
      3. Name = “Reconciled Authors”
      4. click VIAF
      5. Under Reconcile each cell to an entity of one of these types:, choose Work
      6. Click, Start Reconciling
      7. authors > Reconcile > Actions > Match each cell to its best candidate
      8. authors > Edit column > Add column based on this column…

        1. New column name = major works
        2. Expression = cell.recon.candidates[0].name 13
        3. OK
      9. In the left-hand sidebar, Remove All facets

    2. The rest of the time…

      1. author > Reconcile > Start reconciling…
      2. Under Services, click Wikidata Reconciliation for OpenRefine (en)
      3. Under Reconcile each cell to an entity of one of these types:, choose, human
      4. Click, Start Reconciling
      5. By clicking the approriate single checkbox in each cell of the authors column, manually slect the most appropriate author for our topic. (Our topic is African American Authors everyone should read). Not every cell has an author for which you must make a selection. Cells 2, 10 need your intervention.

        1. In Cell 2, James Baldwin, select the first option which a match of “(100)”
        2. In cell 10, Click on the first name, then the second name. Do you see an African-American writer? Choose him by clicking the corresponding single check-mark
  3. Add more metadata from Wiki Data (only works for Wikidata as of this writing)

    1. authors > Edit column > Add columns from reconciled values…
    2. Under Suggested Properties, click place of birth and place of death
    3. OK
  4. Copy the author and dates information from the ‘Richard Wright’ cell under the “major works” column; Edit the Richard Write cell under the “authors” column ; paste in the more complete information: Wright, Richard, 1908-1960

  5. Add authoritative author data from VIAF

    1. author > Reconcile > Start reconciling…
    2. Under Services, click VIAF
    3. Under Reconcile each cell to an entity of one of these types:, choose Person
    4. Click, Start Reconciling
    5. authors > Reconcile > Actions > Match each cell to its best candidate
    6. By useing the Choose new match option, fix row 2.
  6. Transform the “major works” column by splitting the column at the pipe: `|’

    1. major works > Edit cells > Trasnform…
    2. Expression = value.split("|")[1]
    3. OK

Now, when you Export as Comma-separated Value (Excel, etc.) All of the reconciled information will be with retained.