GoogleRefine (now OpenRefine) is a standalone open source desktop for data cleanup and transformation. It displays itself as a flat table but behaves like a relational database. It’s a hugely powerful tool and requires some legwork and practice to fully exploit its potential.
An immediate and very practical use of the software includes the ability to clean up messy metadata effectively. Say you have an export of a text file with some semi-structured data; you can edit it using transformations, facets and clustering to re-structure the data.
Screenshot of “categories” for sample data-set
http://data.freeyourmetadata.org/powerhouse-museum/phm-collection.zip
|
GoogleRefine can also be used to convert data values to other formats and extending it with web services, for example for geocoding addresses to geographic coordinates.
Check out http://collection.cooperhewitt.org/people/18060335/ as a good example for linked metadata (person search).
Resources:
OpenRefine (Project homepage)
Getting started with OpenRefine
Using OpenRefine: a manual
Hi Alexander, I came across a post on the Programming Historian site today which covers a lot of the material from the workshop: http://programminghistorian.org/lessons/cleaning-data-with-openrefine
ReplyDeleteExcellent stuff, Padraic. Cheers :-)
ReplyDeleteNice write up Alexander. I came across a useful listing of regular expression "recipes" on github when playing around with google refine last week:
ReplyDeletehttps://github.com/OpenRefine/OpenRefine/wiki/Recipes
John