Screen of classes and properties being drawn
Knowledge Graphs

How to build a knowledge graph: three different ways to get started

July 15, 2024
5 mins
How do you start designing a knowledge graph model? What are the first steps? In this post we will talk through the options when designing a knowledge graph and show how there no one single approach.

There are a number of legitimate routes to get started when designing a knowledge graph. The context of the domain you are working in is key and will dictate which is the most appropriate. Here we will look at the starting points through the lens of the knowledge graph tools we work with on a day to day basis and how they support the differing development paths.

In order to evaluate the options we use the Cynefin framework as a way to think about what sort of domain you are in and the implications for how you should act. Without being aware of the type of context you risk applying the wrong approach and learning mistakes down the road in development where things can quickly get expensive.

The Cynefin framework distinguishes between five areas in which we operate: clear, complicated, complex, chaotic and confused. Confused refers to when we can not identify or have not identified the type of domain we are working in.

For the purposes of this post I am going to talk about the first three:

  • Clear is the category of Best Practices, and there is an intuitive solution to the problem that can be seen by everyone. The right answer is obvious and undisputed within the group.
  • Complicated is the category where good practices can be found. Here there are multiple right answers, and analysis with support of experts is required to figure them out. This domain requires a more systematic approach.
  • Complex is the category where solutions are discovered through exploration and experimentation. As you probe the domain with the team there is a new collective understanding and language that emerges from the process.


Why is this important? Because how you decide to build your knowledge graph should be tied to an awareness of the type of domain you are working in.

Clear

Here we can apply best practice and it is perfectly reasonable to look for models and standards that already exist. The benefit of having well documented models, with documented definitions and are actively used by others provides a great advantage.

In Data Graphs you have the option of loading in an existing model from our catalogue of models:

Examples of models we frequently turn to are:


Or load a model in RDF or JSON:



Note that the benefits of standards and interoperability apply equally to these three starting points and one does not rule out the other as latter we can map to a standard model. Our point here is about starting places for design.

Complicated

In the complicated space we are likely sitting with subject matter experts and facilitating the interpretation expertise into a model. Systematically working through the details of the model: the key classes, their properties and how the classes relate to each other. Often working to a set of user stories or competency questions to focus the scope.

Given the direct facilitation of the modelling (there is not an expectation that the subject expert should draw back at the modeller) we will often be working directly into a model format of choice (for example an ontology) either authoring file in a text editor or a dedicated tool, a classic of which would be Protege

Complex

In a complex domain you will want to be looking to start with a conceptual model and use approaches like domain-driven design. Spending time upfront probing the naming of things, definitions and relationships. Then doing this with multiple people from multiple perspectives. Spending time in front of whiteboards or using virtual domain-modelling tools like Lucid Charts or Data Graphs to draw back at each other. 

Drawing out a conceptual model in Data Graphs:



The requirement to include as many perspectives as possible will mean postponing moving to an ontology representation of the model as late as possible. In fact, talking about ontologies and the representation of a model in RDF can be an impediment to success in this type of domain. It draws attention away from the importance of developing shared understanding and language.

Three ways to start but how do you know what type of domain I am dealing with? What is we identified it incorrectly? Well often you don't know for sure until you have taken your draft knowledge graph model and populated it with some sample data. For this reason even if we think we am dealing with a clear model space we will often spend a small amount of time up front sketching a conceptual model and then test it with data as soon as possible. Applying data to the model will usually throw up questions which will have you returning to further modelling sessions. Shortening this this cycle of design and test with data is often a critical success factor in delivering a knowledge graph in the early stages.

Where Data Graphs stands out is that as you draw your domain model you are in a position to immediately start populating it with data whether that be manually entering example data, a csv upload or via API. Tools like this are going to massively support information professionals ability to push on a knowledge graph project without needing to wrestle with additional technical layers.

Three different ways to start designing your knowledge graph

Three different ways to approach developing a knowledge graph each are equally valid. Tools like Data Graphs now support each of these approaches allowing any to be a way to start building out a knowledge graph. The critical factor here is, within the same tool, with a candidate model you can immediately move on to testing the model with data and quickly iterate through design. With a model in place you are then ready to move on to the next critical success factors for developing a knowledge graph.