Are we still talking about Tags or Taxonomies?

Sometimes its easy to forget that things that are settled in your mind, are still unclear to others. This is especially true when you are immersed in an organisation that uses social collaboration as the main form of communication, when a lot of the world still uses email only.

So when I was asked recently whether tags should be standardised, it rather took me aback. But then I remembers that 10 years ago, much heated debate and excitement was  provoked by tagging philosophy. In fact, one of my earliest social bookmarks was this 2005 article on Tagging and Why it Matters by David Weinberger at Harvard in which he argues that:

…the biggest obstacle to KM achieving its vision of making all information available to everyone in the organization was in fact the difficulty of building and maintaining large classification systems. Even then, such systems never represent everyone’s way of thinking about things. Tagging, on the other hand, doesn’t require a team of Information Architects to argue for years over whether the right term is “natural language processing,” “language parsers,” or “nlp.” Users can use whatever term works for them…

The conclusion I have come to over the years is that tags and taxonomies are different things: both have value, and both are good for certain purposes. This is very visible in platforms like WordPress, where a blog post can be assigned both categories (a classification system the blog owner creates) and tags (applied more informally with greater variety and changing over time). We also argued interminably about things like whether they should be case sensitive (is “conservative” the same as “Conservative” as a label, but also what will people type when searching?) In fact, I blogged about tagging back in 2012 in the importance of tagging.

Ultimately what sold me on tags as the more important tool was the fact that they can reflect the needs of different user communities and evolve as those needs change (and no-one was even talking about being agile then). So, taking the example of an organisation I came across recently, SEPA means Scottish Environment Protection Agency to one group of people and Single Euro Payments Area to another. Does that mean one of them has to change the terminology they use and the way they label things? Nor practical. You can, of course, create a hierarchy (“environment/sepa”, “payments/sepa”), but that puts content in silos – and then if you happen to be in the wrong place when looking for something you will not find it. In general, promiscuous tagging is recommended to address this issue – so if I tag something with “sepa, single, euro, payments, area, eu, eurozone”, then even if my “sepa” search comes up with a whole load of stuff about payments, refining it with “environment” will get rid of them.

One thing to remember is that the flexibility of tagging doesn’t mean standards will not emerge. Displaying a type ahead list of tags to choose from is one way this happens in enterprise social platforms like IBM Connections, but even more critically the whole point is that the tags reflects the terminology that the users are used to using to refer to whatever is being tagged – so they naturally use the same terms (or learn to use the same terms by observing what others do). The alternative (forcing everyone to use a specific term) has a significant disadvantage that it means that anyone who doesn’t understand the terminology cannot effectively find anything – whereas if they were able to discover a few things that have also been labelled using the terminology they know, they can then be led to the right terminology by seeing the tags on the content they do find.

Now this doesn’t mean that categorisation is worthless. It has valid uses (just like Folders – which are basically categorisation of files). However, as it turns out, tags can be used to implement categories but not vice versa. Typically a standard prefix is used to indicate a category within tags – and, as discussed before, complex categorisation requires a hierarchy anyway. So using the tags “org.environment.sepa” and “org.payments.sepa” gives you a way of providing very specific access to content – at the cost of intellectual complexity: why would someone interested in European financial settlements know whether to look for “org.payments.sepa” rather than “org.eurozone.sepa”? However as long as the “sepa” tag is also used, they would pretty quickly realise that the the results all contained either “org.environment.sepa” and “org.payments.sepa” and so know which one to use in future.

Once you get used to this approach, you discover that tags are a great way of refining sepatagcloudsearches (in IBM Connections, by clicking on the tag list on the left hand side of results). So you can quickly refine a free text search by clicking on a tag: e.g. if “sepa” finds a load of both payments and environmental stuff, clicking on “environment” produces a relevant set of results. Or if you don’t find what you want, you can remove “environment” and add “scottish” or “scotland” instead until you do get what you are looking for. This approach also works well starting with a categorisation search (“org.environment.sepa” to get all content related to SEPA, then click the tag “waste” if you want information related to that topic).

The good thing about using tags is that they can change as the terminology changes (have you ever had your department renamed?). The bad thing about using tags is that they will change as terminology changes. In the short term, that is handled by searching for multiple tags. In the longer term, relevant content either gets retagged as needed (as people realise they had to use the old tag to find it, and so add the new one as well) or gets systematically retagged by the content owner (by searching for the old tag, and then adding the new tag to those objects found which are still relevant) – yes, that requires some effort, but will be done if doing so provides value (and time will hopefully not be wasted on it if it does not).

Which leads to my last thought, which is a generic response to questions of the form ‘should I use “colour” or “color” as the tag?’ The answer is to ask the question “why are you tagging it?” If the answer is just so you can find it again, use whichever tag you will search on. If you want other people to find it, then try to imagine what they will search for (which often means using both, in this case so that both British and Americans will find your content).

To conclude then, my recommendation is: tag everything promiscuously with the audience for your tags in mind; create taxonomies when it makes sense to formally categorise data, and implement them as structured tags (and then make sure there is effort and governance to manage them going forward).


The Importance of Tagging

I was on a panel discussing Social Business at ICWSM-12 this week (they recorded a video so I hope it will be available for replay soon). We got some great questions. This led to a discussion about what features of the social collaboration platform were most important for finding and leveraging experts in an organisation.

Roja Bandari tweeted a quote from my answer:


It reminded me that I had been intending for some time to blog about some of the essential but underrated features of enterprise social collaboration platforms – and tagging is a great place to start (recommendations is another, that was highlighted by Igor Perisic of LinkedIn in his keynote at the same event – did you know that 50% of new LinkedIn connections come from recommendations?). Many companies think they can successfully implement social collaboration using previous generation collaboration platforms which do not have these essential capabilities – and then wonder why they are not adopted in the way they expect.

So, why is tagging so important? Well first, let’s make it clear what I mean by tagging in this context.

  1. The ability for users to assign free format words to objects. These are not selected from a restricted taxonomy, but rather allow users to associate words that mean most to them in their context. This allows the tags used to change over time as vocabularies, technologies or practices evolve, it makes it possible for different communities to use tags relevant to them (e.g. I might just think of something as a “Daffodil” while a biologist would label it “Narcissus” and add additional tags for its species) and gives users of different languages the opportunity to create local language tags (perhaps as well as international ones).
  2. Tags can be assigned to any object: a blog post, a shared file, a person, a community, a wiki page, an arbitrary URL, etc.
  3. Tags are not only assigned by the owner when an object is created, but also by any user who finds the object (so they can make it easy to find it again) with the effect of greatly increasing the pool of tags across the organisation (and also ranking how interesting objects are to users based on how many tag them).
  4. When tags are assigned, the system should show suggestions (based on what other users have used to tag this object), using Web 2.0 techniques for type-ahead to make suggestions as the tag is typed (because knowing what tags others have used helps the community to converge on a common set of tags without introducing lots of small variations in spelling, abbreviation, etc.)
  5. Users need to be educated that they shouldn’t try to choose one, “correct tag” but rather “more is better” and to think about the different ways they might want to find the object in the future.
  6. The system should make the tags as useful as possible to users to encourage their use. It should show tag clouds (not lists of tags) that can be easily searched by users (showing the most popular by default) and should allow easy filtering of large lists of objects or results by simply clicking in the tag cloud.
  7. Enterprise wide search of all tags should be provided (rather than having to search separately for different content types of in different repositories) and results of all kinds should be displayed (blog posts, files, people, communities, wiki pages, forum discussions, web pages, etc., etc.)

Two key points here. First, tagging people as well as content (and allowing other people to tag other people they find, rather than relying on people to tag themselves). This is one cornerstone of expertise location (knowing that X has helped other people with topic Y, even though they may not see it as their area of expertise, and providing a way to compensate for lazy people who do not tag themselves – automatic tag generation, e.g. based on courses people have taken, can also be useful here).

This allows results to show both people and content. In practice, users are often searching for a document containing the answer to their question or if that does not exist the person who can help them find it. This reminds me of another underrated capability, the business card. Whenever you see a persons name associated with content (the author, or someone who comment on it, recommended it or downloaded it) you should be able to hover over the name and immediately find out key information about that person, like who they are or how to contact them – and also what other relevant content they have shared.

The second key point is the tagging of URLs (or web page addresses). Very often you come across useful content that is not inside the enterprise social platform (e.g. a news article on a web page, a file in an enterprise content management system, a profile on LinkedIn or Facebook, perhaps even a Tweet). Most solutions call the process of tagging an arbitrary web page social bookmarking as it is similar to creating a bookmark in your browser, but you are doing it on the social platform and so contributing to the total set of bookmarks (links) available across whole the enterprise (for example, allowing me just now to quickly find the most popular of the 321 web pages that IBMers have tagged with “tagging” – the answer, inevitably, being Wordle).

Providing a simply “bookmarklet” for popular browsers that allows users to quickly and easily tag a page and save it as a public bookmark is a key capability all social collaboration solutions should provide. Users soon realise that this lets them simply tag all the web pages they come across that they might be interested in going back to in the future – and then to use the tag cloud to actually find them again with minimal effort (which I certainly couldn’t do if I tried to keep my >3,000 tagged pages as bookmarks in my browser!)

The social bookmarklet implements a couple of key requirements for essential social collaboration. Firstly it generates the maximum social capital with the minimum effort (I see this web page and I would like to tag it so I can find it again – but as a side effect I have contributed to a set of tags which provide even more value to all employees across the company). A well implemented bookmarklet takes this even further – for example by allowing the bookmark to be automatically added to one or more Community spaces the user belongs to (sharing the link more widely and helping keep the community fresh), by letting the user to provide a few lines of text and have a blog post automatically created explaining why they find this page to be interesting to all the people reading their blog, and by facilitating the creation of an activity around the link so the user can manage any follow up actions. Creating as much social content as easily as possible is key to effective transmission of discovered knowledge to other people in the organisation who need it.


In addition, the bookmarklet allows a user to generate social content from their browser when viewing a web page – without needing to go somewhere else. This is an example of the overall need of social collaboration solutions to integrate with the users’ desktop applications. If the user had to copy the bookmark, navigate to another page, open a form, paste in the bookmark and add the tags, they are far less likely to do it. In the same way they need to be able to save a file directly from their document editor into the social file sharing repository (or drag and drop an attachment from their mail client or a file from their desktop), post to their blog from their favourite word processor and post a status update from where they are working. Ease of use doesn’t just mean being able to figure out how to navigate an application, it means being able to do what the user wants from the context where they need to do it with the minimum of clicks.

I often talk of social bookmarks (or, more likely, of tagging, since I think it better reflects what we are doing here) as “Indexing the Intranet” (although, in practice, it is indexing the parts of the Internet that your colleagues find useful as well). Most users have a negative perceptions of enterprise search – and one that is historically justified since the nature of intranet content and they way it is linked does not offer the search engines the context that cross-site links in the public Internet provide to Google so it can build its page ranks so it can offer user the most relevant responses (alongside those most lucrative to Google). Tagging pages addresses this need since it can let a search engine show at the top of the list the hits that were most often tagged by all the other employees across the company (as long as the enterprise search solution is able to use this information.

That said, once the social collaboration platform is populated with a rich set of tags, many users stop using enterprise search and instead use the social search capabilities – since social bookmarks provide reach to find content outside of the social platform (including on the broader Intranet since the user searching wants the best answer, irrespective of where it can be found) which are displayed along with the blog entries, wiki pages and files shared on the intranet. Also, social search finds both people and content for the topic, and makes it easy to move between documents, web content and experts until the user has found the information they need to do their job effectively.

Finally, tags are also a key input to the social platforms recommendation engine. Not just explicitly (e.g. finding new content be recommend because its tag matches the users tag) but also explicitly (e.g. understanding that users who are tagged the same way, or who create posts with the same tags, have common interests and so are probably more interested in each others content). Meanwhile surfacing the links between users, or between users and communities, enhances other employees ability to find alternative experts when the person they want to contact is not available.

Hopefully this post has made it clear why I find tagging (both in its explicit form and as an aspect of social bookmarks) be an essential capability of social collaboration platforms. It also highlights recommendations, business cards, social bookmarking, integration with desktop applications and social search as key capabilities required to deliver not just knowledge sharing, but also knowledge discovery (which ultimately, is the real objective of a social collaboration platform). IBM Connections provides all of these capabilities – which is why I believe it delivers social collaboration more effectively than most of its competitors today.

I’ll plan on looking at a few more of its differentiating capabilities (from the perspective of the use cases they support) in future blog posts.

In the meantime, let me know what you think is the most useful feature of the social collaboration platform you use.