Graph Databases are not Magic Cauldrons

To deliver, robust, practical, scalable enterprise applications, you should treat your graph database with the same care, attention, and good data governance as any other database in your data architecture.
04 May 2022
Graph Databases are not Magic Cauldrons

Ear of bat, blood of frog, rabbit’s tail. Stir them all up in your graph database, utter the magic words, and lo, with a sprinkling of sparql dust, gold speweth forth.

After working with graph databases for 10 years now, it troubles me that this is still how many implementers expect things to work. Graph databases are not magic cauldrons. They really are not.

A graph database is no different from any other database. If you put in data mess, do not expect order to be created from chaos. If you then combine this with over constrained and over-engineered inference and reasoning you end up with an even bigger mess. A popcorn machine.

You will not be able to build robust practical business or enterprise grade applications by treating your graph as some dirty place where ontologists (who?)  can dip their fingers in, mess around ad-hoc, and make magic happen. “let’s just add another inference rule, that will sort it..”

Just like any other practical, robust data architecture, data governance is essential. Just like any other database you would not expect end users to interface directly with it via a SQL endpoint, or a JDBC connector? No, you would build robust, well tested APIs and Services to interface with your database, and let your application builders interface with these services. End users would then interact with these applications. These API services are the gatekeepers of good data governance.  No mess should pass this point. In production, no human hands should *ever* pass this point.

Any changes to models should only be done to meet some business need or requirements. Any changes to models should be tested.

Inference and reasoning should be included in your models *only* to meet some practical value-deriving business need.  If the inference does not deliver any meaningful or practical value to your end users, it should not be there.

If you can deliver a business requirement effectively and easily without inference, do it without inference.

Your models should be as simple as possible to deliver your business requirements. A SQL data architect would not just add additional random tables into their SQL model to deliver their requirements, so when you develop a practical ontology model for your domain, only model what you need to deliver those requirements. You do not need to model extra stuff.

While your domain model may extend well beyond the bounds of your business requirements (in order to better inform requirements and business landscape), your detailed ontology models should not. By only engineering the very minimum of the model needed to deliver your current requirements, your platform is easier to evolve, easier to maintain, and at a much lower total cost of ownership.

Finally - end users should not have access to your production graph database sparql or cypher endpoint. Never.

Stop pretending you are witches. Start thinking like traditional data architects.