Data Graphs ultra  query engine patent
Data Graphs

The Fastest Graph Database in the World?

September 24, 2024
3 mins
We have some pretty exciting news! We have filed an international patent on our new graph database query algorithm. We have also done some further benchmarking tests, and the results are pretty astounding.

A few weeks ago, I blogged about our new query engine benchmark. Compared with Neo4j, Data Graphs was fast. Really fast. I demonstrated orders of magnitude faster.

The Benchgraph benchmark data set is a Slovakian social network. It is not a particularly large dataset, but it is large enough to create meaningful timings and comparisons between graph database query engines. In this benchmark (published here) we showed on average we were 110x faster than Neo4j and >500x faster on the deepest and most complex query.

Most of the more complex of these benchmark queries, constrain the query to a single identified node (a person) in the dataset. And each hop in the graph from this first node, asks questions of the connected social network at that level in the graph, e.g. the worst performing query is:

MATCH (s:User)-->(a)-->(b)-->(c)-->(n:User) WHERE s.id=34885 AND n.age >= 18 RETURN DISTINCT n.id

In this query, 4 hops deep, the initial node is constrained to an identified user in the social network.
However, if we un-constrain the first node - i.e. it could be any user in the social network, and then re-run this over increasing “friend-of-a-friend” hops through the graph:

# Hop 1:
MATCH (s:User)-->(n:User) WHERE n.age >= 18 RETURN DISTINCT n.id

# Hop 2:
MATCH (s:User)-->(a)-->(n:User) WHERE n.age >= 18 RETURN DISTINCT n.id

# Hop 3:
MATCH (s:User)-->(a)-->(b)-->(n:User) WHERE n.age >= 18 RETURN DISTINCT n.id

# Hop 4:
MATCH (s:User)-->(a)-->(b)-->(c)-->(n:User) WHERE n.age >= 18 RETURN DISTINCT n.id

# All the way to Hop 8

If we calculate average timings for each of the increasingly complex queries the true story of our query performance emerges.
On the commodity hardware we ran our benchmarks on, the timings for Data Graphs are as follows:

1 hop: 463ms
2 hop: 1,272ms
3 hop: 1,984ms
4 hop: 2,594ms
5 hop: 3,149ms
6 hop: 3,797ms
7 hop: 4,473ms
8 hop: 5,171ms

You can see Data Graphs is totally linear in its performance - O(n) in Big O notation.
When the same queries are run on Neo4j on the same hardware:

1 hop: 1246 ms
2 hop: 30,292 ms (30 secs)
3 hop: 1,462,483 ms (25 mins) - Data Graphs is 840x faster
4 hop: 111,591,149 ms (30.8 hours) - Data Graphs 43,000x faster
5 hop: we could not complete computation

GPT4o calculates the function of the execution time complexity as O(n^50).
This indicates that the algorithm's runtime grows highly exponentially with respect to the number of hops. This is fairly typical of all graph databases. Obviously “not all queries are equal”, but exponential performance degradation with respect to graph depth and breadth is not atypical.
Projecting these execution timings forward across the function/curve for Neo4j, this would give :

the 5 hop at ~ 9.3 weeks
and the 8 hop query in the order of 22,000 years, compared with Data Graphs at 5.2 seconds

This is genuinely ground breaking.