Unlocking Open Data using an Open Source Database

 I really enjoyed preparing and presenting this talk at pgconf.de on Fri 13 May 2022:

Could we use our favourite open source relational database to unlock the potential of open data?

There is a vast array of open data made available by public sector bodies, charities and commercial organisations. Open data sets span domains such as the environment, the economy, health etc. and are of immense potential value. There are, however, significant challenges when it comes to making use of them.

The data sets are published by diverse bodies, each with their own practices, and are often presented in a semi-structured or human-readable rather than machine-readable format. This means that painstaking manual intervention is often required to make sense of the data, and to load it into a system such as a relational database for analysis.

This talk will introduce you to the PhD research project that I recently started at the University of Manchester, called "Unlocking Open Data through Wrapper Generation". The aim of the research project is to support the generation of wrappers for open data sources. It builds on existing work by my supervisor, Professor Norman Paton, and others.

I would love the project to lead, eventually, to a PostgreSQL extension that automates the creation and population of a set of tables from a given open data set.

I will also describe some of the techniques that I have been learning, such as using genetic algorithms to solve this type of problem.

 

Presentation Slides: