A case study on a recent engagement in the publishing industry using Data Graphs for centrally managing cross-publication content metadata combined with our Text AI NLP service for auto-tagging content to provide a high utility information backbone.
A case study on a recent engagement in the publishing industry using Data Graphs for centrally managing cross-publication content metadata combined with our Text AI NLP service for auto-tagging content to provide a high-utility information backbone.
Tagging content has been a publishing staple for two decades, helping publishers organise content, perform rudimentary content analytics, and also deliver tag based user journeys and content aggregations. In many cases publishers use the tag features built into their CMSs. WordPress and its derivatives are so widely used, yet still have a very weak content tagging capability. This lack of good information management and structured metadata often leads to a tag mess over time, as tags cannot be properly disambiguated, duplicates invariably get created, and journalists apply tags inconsistently. Silver Oliver has written an excellent series of articles on this subject that are worth revisiting.
This is where more contemporary information management technology and AI can be a game changer. I outline below how we have used our own Text AI and Knowledge Graph products to solve these issues, and drastically improve how structured metadata is used at a large global news publisher.
In the context of this project, we were not dealing with a single CMS, but multiple news publications, each with their own publishing technical stack, CMS, mechanism for tagging, and entirely separate tag sets. The ambition was to centralize the metadata used to tag on each publication, and use AI to aid the journalists, essentially raising the consistency level of tagging across news desks, and streamlining and optimising the tagging process. Downstream, this would unlock better user journeys for consumers, better content analytics, and provide for better and more rapid cross-publication innovation.
To achieve this, we integrated:
Our two products Data Graphs and the Text Classification AI work seamlessly together as an ideal solution for an information backbone in the new generation of publishing technology ecosystems.
We move from the legacy pattern of tagging with flat tags, to tagging with well modelled and organised structured data, where each concept we tag with has a globally unique, persistent and durable identifier (a URI / URN), and has its own metadata properties that can be used to provide more information about that concept to help users disambiguate (types, thumbnails, images, descriptions, categories etc) and also provide better user experiences and journeys with these concept tags.
Managing this metadata centrally outside of your CMS provides a number of advantages:
To do this, Data Graphs allows the publisher to:
Using machine intelligence to automatically tag concepts and classifications of your articles, also provides a number of advantages:
The Data Language Text Classification AI (Tagmatic) is designed for exactly this task. It is a high-performance scalable service that learns how you classify and tag your content, and then applies metadata to new content that is sent to it for prediction. The service is highly scalable, fully private, and designed to adapt to each publisher’s unique tagging style. Its REST API allows for very easy integration with your CMS and publishing stack, such that it learns continuously on-the-fly, reacting quickly to new concepts that arise in the ever changing window of news.
Data Graphs and Tagmatic work seamlessly together, through a dedicated connector, such that when a concept is merged or removed in Data Graphs, Tagmatic automatically adjusts its models’ weights to reflect the new state.
Technically Data Graphs and our Auto-tagging Text AI work together as shown in the diagram below. In this case study a custom WordPress metadata plugin was created that integrated with both the Data Graphs and Tagmatic APIs to provide a human-in-the-loop tagging UI directly in the publisher’s CMS:
Where:
In the more detailed images below you can see how this WordPress metadata plugin might look, and how a concept is presented to the user with enough information from Data Graphs to correctly disambiguate the tag.
Once tags are sent back to the training API with the content, Tagmatic makes tagging and classification metrics available via another service API that includes the precision, recall, F1 scores, support for each tag, and also the date the tag was last trained. These metrics can then be used to provide insights including usage analytics and patterns to the editorial floor.
Bringing these technologies together creates a high-utility information backbone for the modern newsroom, streamlining processes, automating others, and unlocking publishing innovation.
By centralising and decoupling metadata from your CMS, and introducing machine intelligence, you unlock a number of key capabilities:
Ultimately, this technology makes tags and concepts management a first class citizen in the publishing stack. No longer just an afterthought inherited from and tightly coupled with the CMS of the day - but something that is built to stay even when the authoring tooling changes, unlocking the true value of your content metadata.