Data Language has exceptional experience in the design, implementation and deployment of knowledge graphs, linked data, and semantic web technologies. We have been working with graph databases since our inception, when our founders worked on possibly the UK’s first enterprise scale knowledge graph for BBC News and Sport. Since then we have helped global organisations such as Cochrane, Wellcome, and Euromoney implement them too. Our typical approach to implementing a knowledge graph solution is:
Through collaborative domain modelling exercises, we work with stakeholders to gain a broad understanding of the target domain, iteratively refining and validating ontology models to achieve the ideal level of detail to deliver the best outcomes, business value and utility.
Not all graph databases are equal. We are wholly vendor-agnostic, and our deep experience in designing scalable, maintainable, and evolvable knowledge graph solutions, means we will always select the right graph technology to deliver your use-cases. We understand the difference between label-property graphs and RDF based graph databases. We have a total grasp on the practical and pragmatic use of semantics and inference to ensure your solution is not over-engineered but delivers just the right level of complexity to optimise your total cost of ownership, and ensure your solution can evolve as your business evolves.
We can help you integrate the knowledge graph with your business systems and workflows, either by helping your own software engineering team with querying the graph using query languages such as SPARQL or Gremlin, through to developing microservices and web interfaces that encapsulate your target use-cases. We are skilled at data engineering, and can help you populate your knowledge graph, transforming and loading your own business data, and linking to or ingesting open-data as required.
We can work with and guide your own infrastructure team, or take on the task of engineering, integration and deployment entirely ourselves. We are skilled at contemporary best practices for deployment of infrastructure in the cloud or on-prem. Our devops engineers are skilled in the use of infrastructure-as-code for fully automating the deployment and maintenance of the entire knowledge graph solution.
Data Language have helped Cochrane transform from a document centric organisation to a data centric one. Their knowledge of linked data and data strategy is world class. They deliver.
We helped Cochrane develop a knowledge graph for evidence-based health care with some 350k concepts covering populations, interventions and outcomes, linking to industry biomedical vocabularies such as MeSH and SNOMED-CT.
The Linked Open Data Cloud contains some 1,239 datasets such as Wikidata, DBpedia, Geonames, IMDB, BBC Music, data.gov.uk, across domains including media, life-sciences, government, social and geography.
The Google knowledge graph contains over 1 billion entities, and 70 billion facts (assertions) about those entities. The majority of Graph Databases today will comfortable scale to many billions of statements.
Knowledge graphs are becoming an important and integral part of an organisation's data landscape. They provide a human and machine readable database of all the things of interest to the enterprise in their domain. The knowledge graph typically describes the domain entities and the semantic relationships between them. It provides interfaces to query the graph as structured data to drive products and user-experiences both internal and external to the organisation. It is evolvable, and the knowledge stored can grow and mutate as the business itself grows and changes. Its value lies in its ability to allow an organisation to ask questions of the things it cares most about.
A graph database is a NOSQL database based on graph-theory that stores data in indexed collections of nodes and edges, where the nodes represent domain entities, and the edges represent the semantic relationships between the entities. Node-Edge-Node relationships (statements) can be chained to create navigable arbitrary-length paths through the data. Graph databases can be queried using one of a number of query languages depending on the database implementation type. They can typically store many billions of assertion statements, and are often used as an implementation technology for knowledge graphs. Graph databases typically form the foundation of a knowledge graph - but not a necessary requirement.
Graph Databases come in two main archetypes: Labelled-Property Graphs (LPGs), and Resource Description Framework (RDF) based graphs.
With a labelled-property graph, nodes and edges both can have metadata (properties) assigned to them in the form of key-value pairs. They are tuned for querying and analyzing Paths through the graph, and do not typically support semantic reasoning.
RDF based graph databases are based upon the W3C RDF standard, and store data in the form of triple-statements (subject-predicate-object). The predicates (relationships) that join nodes together confer semantic meaning upon the data. RDF graph databases can be reasoned upon, and can infer new assertions (statements) from the semantic data model.
The method of querying a knowledge graph depends on its implementation, they are typically queried using one or more dedicated query languages. A knowledge graph built upon a labelled-property architecture (e.g. Neo4J) is typically queried with either Cypher or Gremlin. A RDF based graph (e.g. GraphDB or StarDog) is queried using SPARQL. Graph database vendors are also starting to provide GraphQL interfaces as a query mechanism, but this does not provide the expressiveness and feature-capability of the native query languages. These query languages are typically aimed at developers and data specialists, for business users, the ability to navigate and ask questions of the graph is typically abstracted into microservices, search interfaces, visual graph explorers, or tailored user-experiences to meet specific use-cases.
GraphQL is a contemporary API technology developed by Facebook. Typically it is used for querying, manipulating and serializing JSON data structures. Until recently (as of Jan 2020) it was not normally associated with the querying of graph databases, but a number of graph database vendors are now providing GraphQL API interfaces alongside the native query languages like SPARQL and Gremlin. While it is a very high utility, and extremely developer-friendly API technology, it is not as expressive or as fully featured as the native graph query languages.