An introductory guide to tagging, Part 1. Examining how developments in search technology created the perception that good information management practices were redundant.
An introductory guide to tagging: Part 1
This is the first in a series of short blog posts looking at the relationship between information management and the web.
This post looks at how developments in search technology between the late eighties and today created the perception that good information management practices were redundant.
A conversation I had a few years ago with Edmund Weiner of the Oxford English Dictionary has stayed with me ever since. We talked about the 1980s and how, during that decade, some publishers started to structure print for computers. He recalled how questions began to be asked about what documents were really about and how to represent that knowledge in a database.
Then came the advent of search technologies. Edmund reflected on how this resulted in us throwing away hard-won information management skills in favour of black box technology we didn’t control:
The march of progress swept us in a different direction, one that offered great prizes in a much more immediate way. Our partners at the University of Waterloo had developed one of the first rapid textual search engines, named Pat. It could search simply-structured text of any kind almost instantaneously and with well-tagged text it was miraculously powerful. As was said at the time, we didn’t need to build the intelligence into the database as it could be built into the search engine.
Modelling Lexical Structures in the Oxford English Dictionary by Edmund Weiner 
This situation still prevails today. Intelligence resides within the search engines and not in an organisation's own publishing systems.
The web followed the same path. Powerful search engines and poorly described pages. Content would express its subject to bots through search engine optimised copy and links. The abuse of meta tags with keyword stuffing only helped to give subject tagging a bad name.
These technology developments removed the incentive to invest in doing your own information management and many organisations unintentionally outsourced their business intelligence and value creation to search providers.
However, industry attention has recently shifted back towards subject metadata, tagging and knowledge graphs as an area of investment. Google have long since moved to advocating a more structured approach to publishing on the web using schema.org to provide a baseline data model and Wikidata to provide a baseline knowledge graph.
What is the reason for the change in perception and the renewed interest in information management? Organisations are starting to wake up to the fact they no longer want to outsource business intelligence to Google and other third parties. To get the most value out of their content they need the ability to query and aggregate content themselves.
Refitting the plumbing to tag content and manage controlled vocabularies is potentially a large and complex investment. But there are principles and ideas that information scientists have been practicing for years that can help, regardless of the size of the organisation. These challenges will not be solved by technology alone. Library skills are needed to fill these gaps in expertise.
In the next post I am going to look at the importance of subject tagging.
NOTE: If you want to learn more about Library Science and its the family of associated disciplines ISKO UK and Taxonomy Bootcamp is a great place to start (UK focus). They run a brilliant series of events as well as bringing together the community of practitioners.
Part 2: The importance of tagging when publishing on the web
Part 3: Why good subject tagging is hard
Part 4: How to use domain modeling to improve subject tagging