Linked Data uptake

Linked Data is a universal approach for naming, shaping, and giving meaning to data using open standards. It was meant to be the second big information revolution after the World Wide Web. It was supposed to complement the web of documents with the web of data so that humans and machines can use the Internet as if it is a single database while enjoying the benefits of decentralisation1This is the balance between autonomy and cohesion – essential for any socio-technical system..

Today, we have 1495 linked open datasets on the web, according to the LOD cloud collection. Some among them, like Uniprot and Wikidata, are really big in volume, usage, and impact. But that number also means that today, 15 years after the advent of Linked Data, LOD datasets are less than 0.005% of all publicly known datasets. And even if we add to that the growing amount of structured data encoded as JSON-LD and RDFa in the HTML, most published data is still unavailable in a self-descriptive format and is not linked.

That’s in the open web. Inside enterprises, we keep wasting billions attempting to integrate data and pay the accumulated technical debt, only to find ourselves with new creditors. We bridge silos with bridges that turn into new silos, ever more expensive. The use of new technologies makes the new solutions appear different, which helps us forget that similar approaches in the past failed to bring lasting improvement. We keep developing information systems that are not open to changes. Now, we build digital twins, still using hyper-local identifiers, so they are more like lifeless dolls.

Linked Enterprise Data can reduce that waste and dissolve many of the problems of the mainstream (and new-stream!) approaches by simply creating self-descriptive enterprise knowledge graphs, decoupled from the applications, not relying on them to interpret the data, not having a rigid structure based on historical requirements but open to accommodate whatever comes next.

Yet, Linked Enterprise Data, just like Linked Open Data, is still marginal.

Why is that so? And what can be done about it?

I believe there are five reasons for that. I explained them in my talk at the ENDORSE conference, the recording of which you’ll find near the end of this article. I was curious how Linked Data professionals would rate them and also what I have missed out on. So I made a small survey. My aim wasn’t to gather a huge sample but rather to have the opinion of the qualified minority. And indeed, most respondents had over seven years of experience with Linked Data and semantic technologies. Here’s how my findings got ranked from one to five:

Continue reading

  • 1
    This is the balance between autonomy and cohesion – essential for any socio-technical system.

Buckets and Balls

Linked Data is still largely unknown, or misunderstood and undervalued. Often, people find it simply too difficult. So I keep looking for new ways to make Linked Data more accessible. And with some success. In my training courses so far over 60% of the participants had no IT background. I hope even to increase this percentage in the future.

What seems to be most challenging is writing SPARQL queries. The specification is written for IT people. There are some great courses and books but they also target people with some or more IT experience. If anything, that scares the rest and keeps SPARQL away from the masses.

I keep learning what is challenging. A recurring problem – and an unexpected one – is the concept of variable.

What is a variable in SPARQL? Just a placeholder. But how can you imagine a placeholder? It’s abstract. We have no way of grasping abstract things unless we associate them with something physical and concrete. It’s difficult to imagine time, but once we draw it in space it gets easier. We can’t picture furniture, but we have no problem with chair.

The other issue is how a SPARQL query looks. While working with SPARQL helps to understand how a knowledge graph works, a SPARQL query doesn’t look like one. It is like with symbols in mathematics. “5 doesn’t look like five, while ||||| is five”. The problem with SPARQL is similar:

You want to query knowledge graph.
You want to learn new things.
But your query doesn’t look like knowledge graph.
It looks like lines of strings.

So, how to handle together the problems with grasping variables and with the look of SPARQL?

My suggestion is to imagine every SPARQL query as a graph of linked buckets and balls.

Variables are placeholders but abstract. We need a physical container1The idea of using containers is very powerful. The whole arythmetics and alegbra can be done using only the concept of container as demonstrated by William Bricken. to fill with things. We need buckets. And nodes are like balls. So, think of running2“Running” is also a metaphore and what it stands for can be communicated more gracefully. And that’s important. As you know, language shapes the way we think. a query as filling buckets with balls.

A graph pattern then will look like this:

A bucket ?A should be filled with those balls which have a relation R to ball B.

But it looks nicer when we abbreviate it like this:

This is a graph pattern in Buckets’n’Balls notation. The direction of the relation R is not shown but it’s always from left to right.

The process of writing and running a SPARQL query would then go through the following steps: Continue reading

  • 1
    The idea of using containers is very powerful. The whole arythmetics and alegbra can be done using only the concept of container as demonstrated by William Bricken.
  • 2
    “Running” is also a metaphore and what it stands for can be communicated more gracefully. And that’s important. As you know, language shapes the way we think.