Publishing JSON-LD for Developers
JSON-LD has been among us for several years now, its adoption is increasing, but I suspect not as fast as it probably should. JSON-LD arose out of the RDF community, subsequently adopted as a W3C standard, as a pattern for representing RDF and linked data (hence 'LD') as JSON, with a key aim of making RDF much easier to consume for developers.
While Google has raised awareness of JSON-LD considerably as it’s preferred pattern for marking up web pages with schema.org structured data, JSON-LD however has only made relatively small inroads into mainstream web development. There are number of reasons for this, not least that the W3C and JSON-LD working group did not do a great job of communicating its worth and explaining how to use it effectively, although this has improved a lot more recently, the excellent (and long overdue) JSON-LD API best practices guide published in February 2018 is a very good example.
I am going to attempt to help the cause too, and make a convincing case of why JSON-LD should be used in mainstream software development, regardless of whether you are working with RDF.
A Very Brief Overview
I want to avoid making this post a full-on lesson about JSON-LD and RDF, so if you are not entirely sure what JSON-LD is, here is super brief 101. Alternatively get stuck into the full JSON-LD 1.0 specification here, and the draft 1.1 spec here.
Roots in RDF :
JSON-LD is a W3C standard for representing RDF in JSON format.
It is a standard pattern (schema) for representing the key elements of RDF resources and the statements about them, such as the resource identifier (the IRI) of the object being represented and the rdf type (class) of object being described. For example, you will often see snippets of JSON-LD looking like this :
{
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple"
}
If you are familiar with RDF, you will be aware that most RDF formats typically involve a proliferation of IRIs/URIs everywhere. JSON-LD however alleviates this by abstracting base URIs into a context
. For the example above, if this was written without a context in valid JSON-LD it would look more like this :
{
"@id": "http://data.example.com/organizations/1",
"@type": "http://data.example.com/some-model/Organization",
"http://data.example.com/some-model/name": "Apple"
}
While verbose and ugly, in RDF this can be represented as, or transformed into long form statements (triples) like this :
<http://data.example.com/organizations/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.example.com/some-model/Organization> .
<http://data.example.com/organizations/1> <http://data.example.com/some-model/name> "Apple" .
You can test the validity and see how JSON-LD can be expanded into RDF using the JSON-LD playground here.
The Context :
Introducing the ‘context’ into the JSON-LD makes this much more palatable. The context defines the fields in the data payload :
{
"@context": {
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name"
},
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple"
}
In this example, the original data from my first snippet, the type and name properties of the resource are explicitly identified in some schema or ontology, so as a consumer of this JSON you are aware of the model that this data was described with. Without the context the JSON cannot be transformed to RDF statements. As a consumer of course you can choose whether to ignore the context, or use it depending on your own need.
This same JSON-LD is often written such that that the context and payload are separated using the @graph construct as follows. This is in fact identical data to that above, it just identifies the data payload and the context that describes it as separate JSON objects.
{
"@context": {
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name"
},
"@graph" : {
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple"
}
}
Back in RDF world, this also lets us create a named-graph which would then allow us to generate RDF quad statements instead of triples through the addition of an identifier for the graph :
{
"@context": {
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name"
},
"@id" : "urn:my:organizations:1",
"@graph" : {
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple"
}
}
Which if you try this in the json-ld playground will give you RDF statements :
<http://data.example.com/organizations/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.example.com/some-model/Organization> <urn:my:organizations:1> .
<http://data.example.com/organizations/1> <http://data.example.com/some-model/name> "Apple" <urn:my:organizations:1> .
To make this more concise the context does not need to be explicit in each payload, it can be remote, so a consumer can choose whether to dereference it or not :
{
"@context": "https://data.example.com/contexts/organization",
"@id" : "urn:my:organizations:1",
"@graph" : {
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple"
}
}
This also means contexts can be cached for higher performance.
So there you have it, RDF in JSON form. JSON-LD has brought RDF out of the labs, and made it presentable with a highly consumable JSON representation for developers.
Or has it...
Mythbusters
Myth 1 - frequently spoken by developers : “it’s not like normal JSON”
While the introduction of JSON-LD is undoubtedly a massive deal for the RDF development community, it has not been widely embraced by software developers. Most developers will not have heard of RDF, and even if they are consuming JSON-LD from some RDF or Graph Database backend, they won’t (and probably don’t need to be) concerned with it.
A key design goal in the JSON-LD specification is :
Zero Edits, most of the time - “JSON-LD ensures a smooth and simple transition from existing JSON-based systems. In many cases, zero edits to the JSON document and the addition of one line to the HTTP response should suffice“
However, when developers first come across JSON-LD it is rarely like this. The majority of published JSON-LD is littered with JSON-LD nuance. Developers do not like consuming many of the JSON-LD constructs, and when you hear the complaint “it’s not like normal JSON” it is typically for good reason. For example, two of the most common complaints are :
The "@" prefixes :
The “@” prefix on the JSON properties is not a valid ES5 or ES6 prefix for javascript variables, meaning when dereferencing JSON as a JS object, you cannot do things like
const id = myObject.@id ;
Instead you need to code
const id = myObject['@id']
Arrays vs Singletons :
JSON-LD output from some SPARQL query or RDF transformation typically does not confer any indication if a property is an array or a singleton. The result of one request to some back-end API serving JSON-LD might give :
{
"@context": "https://data.example.com/contexts/organization",
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple",
"employs": {
"@id" : "http://data.example.com/people/1",
"@type": "Person",
"name" : "Tim Cook"
}
}
A future request may return an array for the employs
property :
{
"@context": "https://data.example.com/contexts/organization",
"@id": "http://data.example.com/organizations/1",
"@type": "Organization",
"name": "Apple",
"employs": [
{
"@id" : "http://data.example.com/people/1",
"@type": "Person",
"name" : "Tim Cook"
},
{
"@id" : "http://data.example.com/people/2",
"@type": "Person",
"name" : "Jony Ive"
}
]
}
The software developer consuming this output then has to code around this when evaluating the value of the employs
property, and check whether the property contains an array or a singleton.
I imagine these are possibly the two most common JSON-LD complaints from devs. Even Google publish their schema.org JSON-LD with @id and @type field names on the Google Knowledge Graph API. This doesn't help, and yet it doesn’t have to be like this. Instead of telling developers to “get over it”, as publishers of JSON-LD we can make JSON-LD look and behave like normal JSON using the context, meeting the design goal of the specification. However, it is not always obvious how to make this happen, here are some simple examples on how to address the two common issues above.
Addressing the “@” issue
Use context aliasing to abstract the “@” symbol away from the data payload. By creating aliases in the context for those properties prefixed with @, they can be appear in the data payload fully JS compatible. The context does not just allow us to provide definitions of our domain-specific JSON properties, but we can also alias the JSON-LD standard fields. Try this in the JSON-LD playground :
{
"@context": {
"type" : {
"@id" : "@type"
},
"id" : {
"@id" : "@id"
},
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name",
"employs" : "http://data.example.com/some-model/employs"
},
"id": "http://data.example.com/organizations/1",
"type": "Organization",
"name": "Apple",
"employs": [
{
"id" : "http://data.example.com/people/1",
"type": "Person",
"name" : "Tim Cook"
},
{
"id" : "http://data.example.com/people/2",
"type": "Person",
"name" : "Jony Ive"
}
]
}
Handling collections :
Use the context to define which fields are arrays, and output them as such :
{
"@context": {
"type" : {
"@id" : "@type"
},
"id" : {
"@id" : "@id"
},
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name",
"employs" : {
"@id" : "http://data.example.com/some-model/employs",
"@container" : "@set"
}
},
"id": "http://data.example.com/organizations/1",
"type": "Organization",
"name": "Apple",
"employs": [
{
"id" : "http://data.example.com/people/1",
"type": "Person",
"name" : "Tim Cook"
},
{
"id" : "http://data.example.com/people/2",
"type": "Person",
"name" : "Jony Ive"
}
]
}
Forcing the set (or list) property to be marshalled as an array in JSON even when it is a singleton. See this in action in the JSON-LD playground :
{
"@context": {
"type" : {
"@id" : "@type"
},
"id" : {
"@id" : "@id"
},
"Organization" : "http://data.example.com/some-model/Organization",
"name" : "http://data.example.com/some-model/name",
"employs" : {
"@id": "http://data.example.com/some-model/employs",
"@container": "@set"
}
},
"id": "http://data.example.com/organizations/1",
"type": "Organization",
"name": "Apple",
"employs": [
{
"id" : "http://data.example.com/people/1",
"type": "Person",
"name" : "Tim Cook"
}
]
}
Using a remote context (that most web developers can safely ignore), we now we have JSON-LD that looks just the way your developers would prefer, and meeting the design goal of the JSON-LD specification :
{
"@context": "https://data.example.com/contexts/organization",
"id": "http://data.example.com/organizations/1",
"type": "Organization",
"name": "Apple",
"employs": [
{
"id" : "http://data.example.com/people/1",
"type": "Person",
"name" : "Tim Cook"
}
]
}
Myth 2 - “JSON-LD is only needed if you are working with RDF or Graph Databases”
If you are building a web application or micro-service which is using RDF, then JSON-LD is clearly a no-brainer. However if you don’t care about RDF why bother ?
I argue that you should bother. If you are publishing an API you may not know how your consumers are using your data. They may well be using a graph database, or dealing with RDF or linked data and be very keen to consume your API in an RDF compatible format. Furthermore, publishing JSON-LD requires you to publish a context. You are thus constraining your JSON to a schema. This is a good thing. It communicates a model of your domain, provides a payload contract for your consumers (or a contract between your own micro-service APIs and applications), and a validation mechanism.
Why we should all adopt JSON-LD
JSON Entropy
Being inherently a schemaless message format, applications built with components that use JSON to serialize/deserialize data moving between them have a tendency over time to create JSON property soup. As requirements evolve, and developers are left to their own devices to modify features and adapt the codebase, properties are introduced into JSON on an ad-hoc basis. JSON entropy increases. What may have started out as a simple, clean JSON model, tends towards one of mess and complexity.
Adopting JSON-LD as your message format reduces this entropy escalation, as :
a) it makes you as a developer think about the schema (structure) of the JSON from the perspective of your consumers, and the wider application as a whole.
b) once we start publishing schemas for our data, we start to think like a data architect rather than as a developer.
c) we are more likely to collaborate on the schema design, and at the very minimum, ask for a peer review of a proposed change.
While this may seem like an overhead, it is a relatively small investment into your data architecture that pays dividends over the long term.
Schema
The JSON-LD context is a schema for your data, not only defining the property datatypes but also the classes of json resources and, via referenced ontologies, the semantics and relationships between properties and classes in your data.
The json-schema project has attempted to fill this space for JSON, but has not really gained too much traction. While it is a little richer in schema-definition features than what a JSON-LD context provides, JSON-LD is an official W3C standard, and offers a lot more than json-schema can with respect to linked data and RDF compatibility.
We all can publish linked data
If you are using linked data or RDF, then you will no doubt already be familiar with JSON-LD, and hopefully using JSON-LD as a message format in your own applications. If you are publishing linked data then it is very likely you are publishing it as JSON-LD. If you are serving public APIs as regular JSON, then publishing JSON-LD instead makes a lot of sense. As discussed, you should be able to transition with little or zero change to your data payload. All it takes to make the transition is to publish a context. For little additional overhead, you and your consumers benefit. It is essential however, that we, as publishers of JSON-LD make it as consumer friendly as possible (and this includes Google).