5 Hands-on: Reconciliation
OpenRefine’s Reconciliation service is used to semi-automate the process of matching data in OpenRefine fields with more authoritative data in external sources. This reconciliation function is called semi-automated because the end-user is given the opportunity to interactively approve or select which data are modified by choosing from a pick-list of results. This process can be used to improve and standardize individual data fields or columns of data inside of OpenRefine. Or it can be used to extend the data in OpenRefine.
For example: Match each name in a list of authors against an external authoritative list of authors. Augment the authoritative entry of each author’s name; Add an example book title for each author.
5.1 Reconciliation
Goal: Import new author data into a new project, Use the WikiData and VIAF reconciliation services to gather authoritative versions of African American author names and an example of each author’s work, plus their placed of birth and death.
Make a New Project, Import Author Data
https://raw.githubusercontent.com/libjohn/openrefine/master/data/AA-authors-you-should-read.csv
- You many want to give your project a pretty title
- Change Show: to 25 to see all 11 records.
First time / One time …
- In the Enter the service’s URL: textbox, enter:
http://refine.codefork.com/reconcile/viaf
12 - Name = “Reconciled Authors”
- click VIAF
- Under Reconcile each cell to an entity of one of these types:, choose Work
- Click,
- New column name = major works
- Expression =
cell.recon.candidates[0].name
13
In the left-hand sidebar,
facets
The rest of the time…
- Under Services, click Wikidata Reconciliation for OpenRefine (en)
- Under Reconcile each cell to an entity of one of these types:, choose, human
- Click,
By clicking the approriate single checkbox in each cell of the authors column, manually slect the most appropriate author for our topic. (Our topic is African American Authors everyone should read). Not every cell has an author for which you must make a selection. Cells 2, 10 need your intervention.
- In Cell 2, James Baldwin, select the first option which a match of “(100)”
- In cell 10, Click on the first name, then the second name. Do you see an African-American writer? Choose him by clicking the corresponding single check-mark
Add more metadata from Wiki Data (only works for Wikidata as of this writing)
- Under Suggested Properties, click place of birth and place of death
Copy the author and dates information from the ‘Richard Wright’ cell under the “major works” column; Edit the Richard Write cell under the “authors” column ; paste in the more complete information:
Wright, Richard, 1908-1960
Add authoritative author data from VIAF
- Under Services, click VIAF
- Under Reconcile each cell to an entity of one of these types:, choose Person
- Click,
- By useing the Choose new match option, fix row 2.
Transform the “major works” column by splitting the column at the pipe: `|’
- Expression =
value.split("|")[1]
Now, when you
as Comma-separated Value (Excel, etc.) All of the reconciled information will be with retained.5.2 Reconciliation Resources
- YouTube: Reconcilliation in OpenRefine: Part 1 by Owen Stephens
- YouTube: Reconcilliation in OpenRefine part 2
- Reconciliation Data Sources
- For low use needs: VIAF / ORCID / Open Library
- For more advanced needs: conciliator
- OpenRefine documentation on Reconciliation
- OpenRefine technical documentation on reconciliation API
- Transform Recon Candidates to cell data: e.g. cell.recon.candidates[0].id