Are we still talking about Tags or Taxonomies?

Sometimes its easy to forget that things that are settled in your mind, are still unclear to others. This is especially true when you are immersed in an organisation that uses social collaboration as the main form of communication, when a lot of the world still uses email only.

So when I was asked recently whether tags should be standardised, it rather took me aback. But then I remembers that 10 years ago, much heated debate and excitement was  provoked by tagging philosophy. In fact, one of my earliest social bookmarks was this 2005 article on Tagging and Why it Matters by David Weinberger at Harvard in which he argues that:

…the biggest obstacle to KM achieving its vision of making all information available to everyone in the organization was in fact the difficulty of building and maintaining large classification systems. Even then, such systems never represent everyone’s way of thinking about things. Tagging, on the other hand, doesn’t require a team of Information Architects to argue for years over whether the right term is “natural language processing,” “language parsers,” or “nlp.” Users can use whatever term works for them…

The conclusion I have come to over the years is that tags and taxonomies are different things: both have value, and both are good for certain purposes. This is very visible in platforms like WordPress, where a blog post can be assigned both categories (a classification system the blog owner creates) and tags (applied more informally with greater variety and changing over time). We also argued interminably about things like whether they should be case sensitive (is “conservative” the same as “Conservative” as a label, but also what will people type when searching?) In fact, I blogged about tagging back in 2012 in the importance of tagging.

Ultimately what sold me on tags as the more important tool was the fact that they can reflect the needs of different user communities and evolve as those needs change (and no-one was even talking about being agile then). So, taking the example of an organisation I came across recently, SEPA means Scottish Environment Protection Agency to one group of people and Single Euro Payments Area to another. Does that mean one of them has to change the terminology they use and the way they label things? Nor practical. You can, of course, create a hierarchy (“environment/sepa”, “payments/sepa”), but that puts content in silos – and then if you happen to be in the wrong place when looking for something you will not find it. In general, promiscuous tagging is recommended to address this issue – so if I tag something with “sepa, single, euro, payments, area, eu, eurozone”, then even if my “sepa” search comes up with a whole load of stuff about payments, refining it with “environment” will get rid of them.

One thing to remember is that the flexibility of tagging doesn’t mean standards will not emerge. Displaying a type ahead list of tags to choose from is one way this happens in enterprise social platforms like IBM Connections, but even more critically the whole point is that the tags reflects the terminology that the users are used to using to refer to whatever is being tagged – so they naturally use the same terms (or learn to use the same terms by observing what others do). The alternative (forcing everyone to use a specific term) has a significant disadvantage that it means that anyone who doesn’t understand the terminology cannot effectively find anything – whereas if they were able to discover a few things that have also been labelled using the terminology they know, they can then be led to the right terminology by seeing the tags on the content they do find.

Now this doesn’t mean that categorisation is worthless. It has valid uses (just like Folders – which are basically categorisation of files). However, as it turns out, tags can be used to implement categories but not vice versa. Typically a standard prefix is used to indicate a category within tags – and, as discussed before, complex categorisation requires a hierarchy anyway. So using the tags “org.environment.sepa” and “org.payments.sepa” gives you a way of providing very specific access to content – at the cost of intellectual complexity: why would someone interested in European financial settlements know whether to look for “org.payments.sepa” rather than “org.eurozone.sepa”? However as long as the “sepa” tag is also used, they would pretty quickly realise that the the results all contained either “org.environment.sepa” and “org.payments.sepa” and so know which one to use in future.

Once you get used to this approach, you discover that tags are a great way of refining sepatagcloudsearches (in IBM Connections, by clicking on the tag list on the left hand side of results). So you can quickly refine a free text search by clicking on a tag: e.g. if “sepa” finds a load of both payments and environmental stuff, clicking on “environment” produces a relevant set of results. Or if you don’t find what you want, you can remove “environment” and add “scottish” or “scotland” instead until you do get what you are looking for. This approach also works well starting with a categorisation search (“org.environment.sepa” to get all content related to SEPA, then click the tag “waste” if you want information related to that topic).

The good thing about using tags is that they can change as the terminology changes (have you ever had your department renamed?). The bad thing about using tags is that they will change as terminology changes. In the short term, that is handled by searching for multiple tags. In the longer term, relevant content either gets retagged as needed (as people realise they had to use the old tag to find it, and so add the new one as well) or gets systematically retagged by the content owner (by searching for the old tag, and then adding the new tag to those objects found which are still relevant) – yes, that requires some effort, but will be done if doing so provides value (and time will hopefully not be wasted on it if it does not).

Which leads to my last thought, which is a generic response to questions of the form ‘should I use “colour” or “color” as the tag?’ The answer is to ask the question “why are you tagging it?” If the answer is just so you can find it again, use whichever tag you will search on. If you want other people to find it, then try to imagine what they will search for (which often means using both, in this case so that both British and Americans will find your content).

To conclude then, my recommendation is: tag everything promiscuously with the audience for your tags in mind; create taxonomies when it makes sense to formally categorise data, and implement them as structured tags (and then make sure there is effort and governance to manage them going forward).