Friday, March 14, 2008

Achieving findability without taxonomies

Theresa Regli, Analyst at CMS Watch, provides some answers to those who argue that taxonomies are not needed to increase findability because their own taxonomy initiatives have failed for some reason or because new (semantic) search technologies will soon emerge:


"While that may be the case for some future date, it's not the case now for business trying to find information today. Yes, text mining technology is getting better at extracting meaning from content and in turn categorizing or using it in a useful way, and one day my cell phone may just let my doctor know immediately if I'm having a heart attack. The technology exists now to be able to do that. But the car has also existed for over 100 years, and most of the continent of Africa doesn't have roads. Useful technology without infrastructure doesn't go very far.

For now, content is stove-piped in multiple systems, and search has made people lazy. People think the answer should be as easy as a keyword. But the answers to our biggest findability questions are no more easily found by typing in a keyword than a non-French speaker might get a ticket on a working Métro line during a strike. Getting there is no easier than what Amtrak had to do to get the tracks laid down for Acela, and they still couldn't get the train to go as fast as it could have due to organizational and regulatory disarray."

This is classical human behaviour. Instead of climbing the mountain to access the riches on the other side of it, we decide to stay put at the foot of the mountain and wait for some inventor to come by with a teleporting machine that will teleport us to the other side. If you think that is a good strategy, then you should probably not bother to deal with taxonomies.

And let's face it - to make content findable, we need to continue (or start?) describing content with both descriptive and structural metadata until the following occurs:

  1. Search engines can actually analyze and understand the semantics of both the query and the content that they index
  2. Search engines know what people are asking for even if they don't know it themselves
  3. People can ask questions in a way that not only other people but also search engines can understand

Tuesday, October 30, 2007

Taxonomies and tagging in MOSS 2007

Although MOSS 2007 has many benefits, two of its most apparent weaknesses are its lack of built in support for creating taxonomies for document classification and for tagging documents with user-defined tags.

I have been exploring Microsoft’s SharePoint site and MSDN forums about Social Computing and Enterprise Content Management and can conclude that I found virtually nothing there about taxonomies and tagging. The most interesting information I found was a forum post called “Document Classification Taxonomy in MOSS 2007” which describes a typical business use case that can be supported by classification taxonomies:

“We have approximately 300 different document classifications and we could create a content type for each, but this would require users to scroll through a list of 300 options every time they upload a file. This is not particularly friendly. What I would like to create is a mechanism whereby users, upon uploading a document, are asked the basic nature of the document."

The only answer to that post recommended to either add 300 content types and use many document libraries or to look at a third party tool such as RAPID that I mentioned above.

My exploration of Microsoft’s ECM blog did not either result in much. What I found was a single post from early 2007 (by Adri Verlaan, a developer on the ECM team) which introduces a “Tagging Starter Kit for SharePoint Server” including a “lightweight working prototype”. I quote:
“Currently, the kit allows authors to attach tags to content and readers to specify tags in which they are interested. Using this information for a specified content source, a customized Content Query Web Part shows only the items that match a reader’s respective tags. “

In other words, there seems to be little about taxonomies and tagging from Microsoft. But, are there 3rd party tools available to make up for this weakness? Well, KWizCom recently released a third party product for MOSS 2007, called “SharePoint Tagging Feature”:

“KWizCom SharePoint Tagging Feature enables the tagging of SharePoint content such as documents, list items, pictures, forms etc. Furthermore, with the included Tag Cloud Web Part, SharePoint Tagging Feature enables many new capabilities such as presentation of items according to tags, tagging e-mail alerts, and more..”.

To add support for taxonomies, there is “RAPID for SharePoint” from Artemis Corporation:
“The RAPID 'Taxonomy' Classification framework allows the application of any taxonomy within SharePoint content. An unlimited number of taxonomies can be
created and used within a site to classify documents and list items. Once classified documents and list items can be filtered, indexed and queried using standard WebParts, List Controls and Views. Fully integrated into Microsoft Office SharePoint Server 2007"

It is hard to tell from these descriptions how capable these 3rd party tools are. If anyone has had hands-on experience of these or equivalent tools, please share.

Thursday, October 18, 2007

Allowing Chaos in Content Management

Content management is not about moving from chaos to total order, from anarchy to total control, from ad hoc to highly structured ways of working. For an enterprise to work, you have to mix and match; give a little freedom here, add a little control there.

What I am saying is that there is no paradox in providing users with the freedom and power to create, tag and share their own content and communicate and collaborate with others as they desire while governing and controlling the production and delivery of content assets. It is just a little bit harder than to either give total freedom or take total control. Because the easy way out to tackle content management challenges is either to do nothing or to put everything under central control.

For example, an organization would probably benefit from having both user-driven metadata (folksonomies/social bookmarking) and centrally developed and managed metadata such as enterprise taxonomies. Why? Because they serve slightly different purposes and often complement each other; where one has its weaknesses, the other one has its strengths.

An organization could probably also benefit from letting employees choose their own tools for ad hoc content-centric collaboration and communication. At the same time, it would probably benefit from providing an infrastructure and structured and governed processes for producing, managing and delivering business-critical content assets.

It just requires a delicate hand to do the right thing with the right content.

Tuesday, October 16, 2007

Why You Need a Concept Model for Integration

"A human being is part of a whole, called by us the 'Universe,' a part limited in time and space. He experiences himself, his thoughts and feelings, as something separated from the rest - a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest us. Our task must be to free ourselves from this prison by widening our circles of compassion to embrace all living creatures and the whole of nature in its beauty." - Albert Einstein

An Enterprise Content Architecture (ECA) essentially consists of content (elements) and metadata (attributes and relations). By applying the proper types of metadata to describe and organize the content within an enterprise, the ECA makes sure that it can be managed and delivered as needed. So far so good. But, the main challenge when trying to establish an ECA and integrate content from various content sources is to overcome two great barriers related to the metadata that surrounds the content:

  1. The first barrier has to do with the structure of the metadata. If the metadata in one content source is different in structure from the corresponding metadata in another content source, we must make sure that they get the same structure.
  2. The second (and even greater) barrier has to do with the semantics – the meaning – of the metadata. If the metadata in one content source does not mean the same as the corresponding metadata in another content source, then we have a problem that we need to resolve. If we are not aware of sematic differences in metadata between two content sources that are to be integrated, then we might get in serious trouble.

To develop an ECA is obviously a challenging - but not impossible - task given that most enterprises have lots of content sources where both the structure and semantics of the metadata have been defined in isolation from how they have been defined in other content sources. This phenomenon iscommonly refered to as the development of “content silos” or “content islands” and it is a result of IT systems being developed to support only specific functions, parts of processes or – at best – entire processes within an enterprise. That is, without an enterprise perspective where all processes and IT systems are viewed as being part of a whole.

To be able to merge different content architectures, there is really no other way than to go back and first (re-)define the basic concepts used in the enterprise. Only when you have established a (agreed upon) concept model on enterprise level you can start to map the content models in different content sources with eachother. Here is a basic approach to establish the enterprise concept model:

  1. Make sure you have commitment from top management and then create a team with represenatives from all “content islands” (that is, departments and/or processes)
  2. Make an inventory of existing content and which content that is needed but does not exist
  3. Identify which content is valuable to the enterprise and therefore should be classified (and managed) as assets
  4. Work out an enterprise concept model in modeling workshops

The enterprice concept model can then guide the creation of enterprise metadata standards, enterprise taxonomy development and (structural and semantic) harmonization of metadata in different content sources thoughout the enterprise.

Friday, September 28, 2007

Insights about Content Management challenges

"Culling Content Management" by Alan Pelz-Sharpe:

"It's long been a gripe of mine that ECM systems were designed to reduce the amount of content you needed to manage to essentials. Yet instead they often just manage everything - trash along with diamonds...//...It's really not that hard to reduce the volumes of content you manage dramatically - a simple content audit can clear out 70-80% without in anyway impacting your RM policies or frankly even being noticable to the end users. Most everything sitting there anyway is a duplication, is redundant or should never have been there in the first place."

"There once was a firm in Nantucket..." by Bob Larrivee at AIIM:

"If your information management and gathering effort is called into question, you may be asked to prove that you have policies and procedures in place that are followed by your employees, using a consistent structure and taxonomy for storing and managing information that is also tracked for purposes of auditing and reporting."

"Strategies for Improving Enterprise Search - Beyond the Out-of-the-Box Experience" by John Ferrara:

"It’s common for enterprise website developers to implement search engines with out-of-the-box functionality, point it at their content repositories, and then just leave it at that. Search is becoming something of a neglected orphan, in part because packaged search products are relatively easy to implement, and then even more easily forgotten...//...Quality search results only come about through applied effort, requiring in particular the skills of an information architect. And IAs must be ready to go well beyond their traditional front-end role, digging into the functional backend and source data of the search engine."

Thursday, March 8, 2007

The Power Of The Enterprise Taxonomy - Part II

With an enterprise taxonomy, the organization gets a tool for increasing the findability of its content through unified access and improved searching and browsing. It also simplifies integration, maintenance, reuse, translation, exchange and syndication of content. So, the business benefits of an enterprise taxonomy should be pretty clear. But how do you actually develop and implement it?

The main challenge in development and implementation of an enterprise taxonomy is of course political. The different units within the organization will need to cooperate and agree upon a common taxonomy and vocabulary. This is by far the trickiest part of the taxonomy development and implementation process, and it should not be underestimated. Otherwise, the process is pretty straightforward. I have made a try to sketch it out below:

1. Define & Research

Developing an enterprise taxonomy should of course start with determining the objectives, scope and requirements for the enterprise taxonomy. The scope should be easy to define, since it should be an enterprise taxonomy. But for what is it needed? How will it be used? How will it be maintained, by whom, how often and with what resources? How will it need to scale? And so on.

You should also perform a content inventory – produce a complete list of all the content that currently exists in the content landscape. To be able to do that you need to go to those who are developing and maintaining the content. What content do they develop or maintain? What is it about? Where is it located? Who needs it? What do they need it for? Once you have the list of all content with questions like these answered, you can start analyzing the content and design the taxonomy.

2. Analyze & Design

Analysis means trying to understand the semantic relationships and patterns between existing content. As in software development, analysis and design are two intertwined activities, two sides of the same coin. You really cannot do one without the other. So, when you start analyzing you will also start designing the taxonomy.

However, it is important to select an architecture that is suited to its purpose and that is scalable, i.e. can accommodate new content. As in software development, establishing the type of architecture should be done as early as possible. Otherwise, there will be problems later. A taxonomy is often envisioned as a hierarchic tree structure, but it doesn't need to be. A taxonomy could also have a flat, network, or faceted architecture. It can also be a mix of two or more of these architectural types.

3. Validate

Now it is time to test and evaluate the taxonomy with the appropriate validation techniques. You can use qualitative validation, quantitative validation, or a combination. In any case, by this stage you should have a first version of the taxonomy so that you can test it on users and stakeholders.

4. Deploy & Implement

Deploying and implementing the taxonomy can be expressed with simpler words - making it ready to use and putting it into use. Deploying the taxonomy means that it is available and can be used by content management systems, search engines, and so on. Implementing a taxonomy means actually attaching its attributes to the existing and new content. The taxonomy attributes can be terms from a controlled vocabulary, a list of standardized terms that describe concepts within the domain. Using a controlled vocabulary with agreed upon and carefully defined terms ensures consistency in content metadata and also sets a common language for the organization that reduces the potential for misinterpretation. So, using a controlled vocabulary for the taxonomy has its clear benefits.

Again, as in software development, there is no such thing as a big bang approach to implementation. Instead, start with a pilot and then start implementing the taxonomy throughout the organization according to a realistic roadmap.

5. Evaluate & Revise

Once the enterprise taxonomy is deployed, you need to maintain it. Two important activities in the maintenance are to evaluate how the taxonomy is performing and revise it when needed.

But how to govern and maintain an enterprise taxonomy is basically a subject of its own, which might also be the subject for a post or two later on.

Tuesday, March 6, 2007

The Power Of The Enterprise Taxonomy - Part I

Large organizations are always grouped into smaller units, sub organizations which inevitably develop their own vocabularies and their own conceptual model of the enterprise. This creates barriers to communication, collaboration and knowledge exchange. Each unit sooner or later creates its own content silo.

An enterprise taxonomy – a system for naming and organizing the content into groups that share similar characteristics – can help to tear down these barriers. It can facilitate access, exploration and understanding of the digital content that exists within an organization. The taxonomy can make it possible to understand the organization at-a-glance by providing a high-level view of the organization as a whole.

As previously argued by Henrik, one must make a distinction between content architecture and information architecture, between the enterprise taxonomy and navigation taxonomies.

In Information Architecture, taxonomies are developed to facilitate search, navigation and presentation. Search engines look for keywords or words in content sources that match a search query, but people actually look for and explore concepts. This is where taxonomies come into the picture in Information Architecture. Every navigation scheme is based on one or several taxonomies. The dilemma is that there is not one single taxonomy that will organize content the way as all users expects it to be organized. So a single taxonomy will in most cases not be sufficient for creating a successful Information Architecture.

However, to develop a single enterprise taxonomy is fundamental in Content Architecture. Managing content begins with organizing it, and the enterprise taxonomy is a key organization tool. Once the content is organized in a consistent manner, any type of content from any content source can (theoretically) be integrated and made accessible throughout the organization. To be usable, the enterprise taxonomy can not be invented. Instead, it must be derived from the content that exists already in the organization. So before it can be developed, a content inventory has to be performed. Such an inventory will also create a better overview and understanding of the organization and each unit’s content.

My coming posts will tell you more about how to perform a content inventory and develop an enterprise taxonomy.

Wednesday, February 21, 2007

Metadata, The DNA Of The Content Enterprise



Metadata is commonly defined as data about data. But I would rather define it as content about content, additional content that is intended to help the user to interpret main content by adding context to it, by putting it in the context of other content. Metadata can provide answers to questions the user might have about the content, such as who created it and when or what subject it is about. Content without any metadata is more or less useless. It is hard to find it, hard to understand where it came from, hard to determine if it is accurate and up-to-date or not, and so on.

“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.” (GIGA)

Metadata is a key ingredients to help users find the content they are looking for. One of the big problems with finding content on the web and in private networks is that most content is textual, written in natural language, which is hard for computers to effectively manipulate and manage. Computers cannot understand the meaning of a piece of content (the semantics), only determine the structure of it (the syntax). So, when you search for content, the search engine will look for content that contains the same words and has the same syntax as your query. But it cannot understand what the words mean. This makes searching quite inefficient.

But this is also where metadata comes into the picture. By tagging the content with words (keywords) that tell more what the content is about than most of the words within the content itself do, the search engine can look for content that contains keywords that are the same as the words you provided in the query. This increases the efficiency of search, enabling more relevant search results.


Once the right content can be easily found and retrieved, metadata can also help with many other things with the content, such as reusing it for other purposes, preserving it, making sure only the right users can access it, and so on.

In other words, metadata is the DNA of a content enterprise, vital for its growth, survival and success. The metadata has to be good, and it has to be the right metadata. Equally importantly is that there is an enterprise taxonomy that organizes all content semantically. But more about that later.