Drafter: managing quality in data publishing

Sarah Roberts
Swirrl’s Blog
Published in
3 min readFeb 24, 2020

--

To make linked data publishing work it must be practical. Ongoing management of non-trivial collections of graph data by a team presents significant, real-world challenges around collaboration and change management. In this post I present a novel solution which has been implemented in Swirrl’s PublishMyData platform for the last 5 years . It’s a solution which provides collaborative real-time previewing and editing of any number of concurrent revisions, a publishing workflow, and authoring-by-API.

The problem

Even given the best tooling and technology, publishing accurate data is a demanding job. It is neither reasonable, nor practical, to demand authors get complicated data right without review and iteration. In the world of linked data, the challenge is further complicated by the need to ensure that not only are the data correct in their own right, but also that they link correctly against the wider corpus of data maintained by the publisher, as well as with the wider web of data. Linked data only works, if the links… link. Quality linked data authoring therefore demands tools to examine, test and iterate new data — in privacy — prior to publication, and in the full context of the existing data and of the wider linked data cloud.

In a professional publishing environment, it is essential to support workflows in which different team members take different roles. For example, data might be modelled by one team member, uploaded by others, curated by individual domain experts, then require descriptive copywriting and, finally, be quality-assured and approved for publication by a senior member. Moreover, the team might not be exclusively human but might also include robots if some of these processes (authoring, data transformation or validation and testing) require an automated component. Finally, different teams and their members need to be working on multiple updates concurrently, and have control over merging and conflict resolution.

The solution

Swirrl’s solution leverages the fourth property of a linked data quad: its graph. When a user wants to make a change to the database, we create a new graph for their changes, and expose the union of this new graph and the existing live graphs, as the endpoint for queries. All this is seamless to the user — what they see onscreen is a full and complete preview of what site users see when the data are published.

This draft endpoint can be kept private or shared with other members of the team to obtain a preview that works as an alternative universe of data, fully browsable via the regular PublishMyData interface . You get a complete live preview of the site as it will look like in its edited form — fully SPARQL-able and API-accessible (with authentication) by any apps you’ve written that need to consume the data.

PublishMyData provides tools to allow sharing of these draft graphs, API access for authoring and testing, automated validations, programmable data transformation pipelines, a multi-step workflow for editorial approval prior to publication, and a suite of tools for managing merges. To see a demonstration, contact bill@swirrl.com.

This article was written by Guy Hilton of Swirrl, with contributions from Rick Moynihan of Swirrl.

--

--