The University of Amsterdam’s PoliticalMashup

Maarten Marx of PoliticalMashup has written a post explaining the background to the project:

The PoliticalMashup project at the University of Amsterdam started collecting parliamentary proceedings in 2008. We started with Dutch proceedings and have since moved to collecting proceedings from other European states as well. Dilipad-logo-REVERSED-300dpi

Our aim is to transform all proceedings into a rich common XML format and store these into a single XML database system. With this database system we facilitated comparative diachronic research for historians, political scientists, linguists and communication scientists.

Currently we collect data from The Netherlands, the UK, Flanders, Germany, Denmark, Sweden and Norway.
The main problem in keeping the collecting up to date and going back as far as possible is changing data formats. The content and layout of the proceedings are in general very stable over time, but the technical formats differ much, especially in the “digital era” (starting roughly around 1995). For older material, OCR errors and badly placed scans form major challenges. Besides these challenges with the texts, much work is needed to recognize, disambiguate and link political entities (speakers, parties, constituencies, ministerial functions, etc) to existing databases.

Aims

After consultation with a panel of users consisting of scientists, journalists and archivists we decided to focus on the following aims:

  • create a complete copy of the proceedings of the meetings in parliament;
  • add metadata which record for each word spoken in parliament when it was said, who said it, in what role, on behalf of which party, and in which context. If possible, also indicate the type of speech act (e.g., speech from central lectern, interruption of a speech, shout from the benches, etc);
  • give each entity a unique identifier which is resolvable by a Handle system comparable to the DOI system; do this for real entities (persons, parties) and textual objects (proceedings, topics, speeches, paragraphs, votes, etc);
  • use these identifiers to link data to existing databases and link the parliamentary data to the Linked Open Data Cloud.

Available tools

Proceedings of the UK and the Netherlands are actively collected “until yesterday”. The collections start in 1935 (UK) and 1814 (NL) respectively. They can be downloaded and accessed through a Search Interface.

Cooperation

DiLiPaD’s Dutch principal investigators Jaap Kamps and Maarten Marx collaborate with the Information Office of the Dutch House of Commons, the Dutch Royal Library, the Dutch National Archive, the Dutch Documentation Centre for Political Parties, and scientists from the Humanities, Social and Computer Sciences.

The IHR and parliamentary data

The Institute of Historical Research has a long association with UK parliamentary data in digital form. We see our involvement in the Dilipad project as continuing with work we’ve had an interest in for some time:

British History Online, our digital library, is a joint project with the History of Parliament Trust, which is also a partner on Dilipad. BHO is a long-established resource and some of the first volumes published were the Journal of the House of Lords and Journal of the House of Commons. Eleven years later and we have added many more volumes of the series, along with other parliamentary material such as the Parliament Rolls of Medieval England, seventeenth-century parliamentarians’ diaries, and the Statutes of the Realm. Have a look here to see what is available: everything apart from the Parliament Rolls of Medieval England is free to all.

Liparm-IHR

Our productive relationship with the History of Parliament has gone beyond the above. My colleagues in IHR Digital maintain the excellent HoP website. For some years discussion of the subtleties of things like mapping changing historical constituencies into neat database fields has been commonplace in our office, and I frequently see the developers poring over one of the handsome HoP print volumes.

The linked data part of the name Dilipad is something that we have also worked on before in a parliamentary context. The Linking Parliamentary Records through Metadata project was a  proof of concept for linking different legislatures through metadata, and the project manager of Liparm, Richard Gartner of King’s College, London, is working with us again on Dilipad. The Parliamentary Markup Language, PML, that Richard devised for Liparm is being used again on the Dilipad project, as are the authority files we created. We are already implementing lessons learned on Liparm to the benefit of Dilipad.

I hope this sets the IHR’s place in the project in a bit of context. In future posts other project members, from Canada, the Netherlands and the UK, will be talking about what led to their involvement in Dilipad.

UK Hansards in PoliticalMashup format

[Cross-posted from PoliticalMashup]

Debates of the House of Lords and House of Commons from 1935 until “yesterday” are available in the XML format developed within the PoliticalMashup project. The debates are available as one dump of XML files and through a rudimentary search interface.
All debates are available in XML, RDF and HTML formats, via a simple parameter:

Speakers in Debates are linked to special biography files containing links to other sources (eg Wikipedia).
Speakers are provided with a unique ID that coincides with their ID at theyworkforyou.com.

All information about this dataset can be found at http://data.politicalmashup.nl/parldumps/uk.

Below is an example speech by Matthew Offord whose page at theyworkforyou is http://www.theyworkforyou.com/mp/24955.
<speech
pm:speaker=”Matthew Offord”
pm:party=”Con”
pm:role=”mp”
pm:party-ref=”uk.p.Con”
pm:member-ref=”uk.m.24955″
pm:id=”uk.proc.d.2013-12-11.1.2.5″>

<p pm:id=”uk.proc.d.2013-12-11.1.2.5.1″>
I am very reassured by the Minister’s response, but will he outline to the House how much money has been saved as a result of those reforms?
</p>
</speech>