Sunday, December 9, 2007

Bridging Data And Content For Enterprise Information Management (EIM)

Enterprise Information Management (EIM) is a concept with potential for improving business management and operations. Its key purpose is to manage enterprise information assets across enterprise and application domains.

Information assets are based on data and content which means that successful EIM needs to bridge the traditionally separated areas of Data Management and Content Management. Both areas have been oriented to the production side of data and content including techniques for creation, integration, administration, access and delivery. Data and Content Management also work with e.g. security, quality assurance and consolidation into master sources.

But there are some differences, as plainly described below:

  • Data Management has its roots in managing data for process workers and users. Data Management has followed the evolution of Enterprise Applications which have required high-volume transactional data.
  • Content Management has its roots in managing documents for information workers and users. Content Management has followed the evolution of the web which has required the aggregation of multiple types of media.

There is also a difference on the consumption side of data and content. Data Management is often complemented with Business Intelligence, which concerns gathering and analysing data for business decisions. Content Management goes well with Enterprise Portals that offers multi-channel delivery of digital content.

The gap between Data and Content Management is very inefficient both from an IT and a worker/user perspective. So, Data Management and Business Intelligence must be better combined with Content Management and Enterprise Portals to enable the potential for EIM.

What is missing? Where should we start?

Enterprise information assets are fundamental to EIM, according to the definition above. Therefore an EIM approach should include the following key activities:

  • Creation of a strategy that provides a unified direction for utilizing information assets (data and content)
  • Definition of an architecture with a common blueprint for information assets (data and content)
  • Establishment of an administration that continuously govern and improve information assets (data and content)

What is your experience?

Sunday, November 11, 2007

A real world case of semantic dissonance

My previous post stated that semantic dissonance is the real challenge of content integration and I will try to illustrate this in this post by using a real world case.

Semantic dissonance between two content sources is a typical silo effect, something that happens when two or more information systems that have more or less overlapping content have been developed in isolation from each other, typically to support specific functions within an enterprise. Semantic (as well as structural) dissonance can be expected when information systems developed in two organizations that are to be integrated after a merger or an acquisition. But it is also a common scenario within a single organization.

A couple of years ago, I was working with preparing a new version of an e-commerce site for a global corporation. The site had been in a status quo condition for a couple of years and was now to be redesigned and equipped with a new user experience with more interactive features and richer content. The site was fed with product information such as product structures, descriptions, images, prices and stock quotes from a couple of back-end system, one of which provided the basic product information. The integration with the back-end systems had always been a problem child and not worked properly since it was first designed and implemented several years earlier. The back-end systems where custom developed and the e-commerce platform used for the web site was a standard product.

When we dug a bit deeper into the design and implementation of the current site and how it used the product catalog that was part of the e-commerce platform, it showed that there was semantic dissonance between the product catalog and the back-end system that provided master product information. To put it short, the definition of a product in the e-commerce platform did not match the definition of a product in the back-end systems. Despite the dissonance, the e-commerce site was fully functioning, but mostly due to a number of workarounds. What didn’t work was to administer the products and related content in the administration tool that came with the e-commerce platform. Nor was it possible to use the features that came out–of-the-box with the e-commerce platform – we would have to build them from scratch. Interestingly, the first version of the e-commerce site was designed and implemented by the same company that provided the e-commerce platform.

This insight put us on a long journey where we had to change not only the public e-commerce front-end and how we used the product catalog, but also the integration solution to the back-end systems. We had to convince the business people that this was absolutely necessary to be able to develop the e-commerce sites in the direction they wanted.

The tool that helped us succeed was an information model that helped us distance ourselves from the data models and identify and resolve the dissonance. We (re)defined the term "product" and identified the matching entities in each of the two systems. If looking only at the implementation, it now seemed as we had mapped two different entities with each other. But what differed were only the terms, not the concepts behind them. The information model clarified that the two different terms actually meant the same thing and thus guided the redesign of the e-commerce site and the integration solution.

The trick is how we created the information model. We did not bang the drums and announce to each and everyone that we would now need to redefine the term product and some related terms. That kind of approach would certainly have met resistance because everybody is naturally protecting “their” definitions. To suggest re-defining a term such as “product” in a successful business that was entirely build on their products would probably been political suicide. Instead, we started off with discussing the existing and somewhat inconsistent and confusing product definitions. We did not attack the flaws. We simply said that we needed to understand these terms and their definitions better. So, we modeled and discussed them together in a series of workshops. And behind the scenes, we created the information model piece by piece. It might seem cunning, and it was.

Friday, November 9, 2007

Thinking on building an electronic archive?

Looking from a governmental perspective, not only more and more information is created by or sent to our governments electronically. The share of content that is corresponded between governments and citizens on paper is decreasing. Content that is created by or received by a Swedish government is a public record and hence the need for stable archive solutions is increasing as well. How do we make sure that the information stays "in shape" throughout the remaining of times then? There are many issues to take in consideration, and there are many models to follow to get the help as well.

During my participation in projects concerning electronic archives I have found though that there are few good implementations to look at for help. And as archive solutions are using and displaying their full functionality and benefits first when some time has passed and the theories have been tested in real life, we are still in the beginning of the lifecycle of these solutions. Good practice isn't really in place yet, so to speak. There are however some findings that should be considered if you are planning on starting a project for e-archiving your information. In my experience, these are the three most important findings for a project to take in consideration.

1. Run the archive project "by the book"
I mean this in every considerable way; that is documentation, modelling, requirements specification, project staffing, budgeting etc. For example, there has to be a solid design model as a foundation for the information model and data model to be defined upon. Information and data will change over time and the foundations of the business and the business rules are less often subject to change. An example of the difficulties is the entity "client". Often, different departments within the same organisation have different views on what a client is. For this to be solid, the design model should only take in consideration the relations to other entities. And skipping this in order to cut a corner will create problems further on.

2. Develop and adopt a method for how you manage new content
In order to keep the information model and the data model as solid as possible, you should develop a method for how to take new content into the archive. From the example above, how do we manage the entity "client" for this new system since it has a different view on what a client is than any of our other systems? Well, with a solid design model and a method of defining the properties of the "new" client as a part of our already developed information model, the new system should be easier to fit into your solution. The thing here however is to realize that there is no perfect model, only ways of putting cubes into round holes without using too much violence. But remember though, the design model has to be solid.

3. Implement a cross functional information process and a process owner
Since the archiving process begins when content is created by or received by an organisation, the archive function has to be a part of a function that looks on the information through its lifecycle and not through organisational constructs. The archive function has to be regarded as a key stone in the information lifecycle and the information process. Today the archives are however often regarded as the things that happen to the information when the business doesn't need it anymore. Archiving issues hence are often considered when it's too late.

The Real Challenge of Content Integration


Many integration problems cannot be solved by designing computer algorithms. For structural dissonance it is possible, but not if there is some kind of semantic dissonance between (what appears to be) the same content in two different content sources. The term might be the same, but the meaning of it might differ. Dissonance typically occurs:

  • When translating real world observations or abstract concepts to information (creating the message)

  • When encoding the information into digital content such as text and images (creating the content)

  • When transferring content between one source and another where information models do not match (integrating content)

Experience tells that semantic dissonance is common in most enterprises (a silo effect). Still, many choose to put their trust and hopes into integration software, believing that IT alone will solve their integration problems. Reality is of course that any semantic dissonances need to be resolved first, before content sources are integrated technology-wise.

Much more is to be said about this subject, and I will return with discussions and real-world examples on how information modeling can help enterprises overcome semantic dissonance.

Tuesday, October 30, 2007

Taxonomies and tagging in MOSS 2007

Although MOSS 2007 has many benefits, two of its most apparent weaknesses are its lack of built in support for creating taxonomies for document classification and for tagging documents with user-defined tags.

I have been exploring Microsoft’s SharePoint site and MSDN forums about Social Computing and Enterprise Content Management and can conclude that I found virtually nothing there about taxonomies and tagging. The most interesting information I found was a forum post called “Document Classification Taxonomy in MOSS 2007” which describes a typical business use case that can be supported by classification taxonomies:

“We have approximately 300 different document classifications and we could create a content type for each, but this would require users to scroll through a list of 300 options every time they upload a file. This is not particularly friendly. What I would like to create is a mechanism whereby users, upon uploading a document, are asked the basic nature of the document."

The only answer to that post recommended to either add 300 content types and use many document libraries or to look at a third party tool such as RAPID that I mentioned above.

My exploration of Microsoft’s ECM blog did not either result in much. What I found was a single post from early 2007 (by Adri Verlaan, a developer on the ECM team) which introduces a “Tagging Starter Kit for SharePoint Server” including a “lightweight working prototype”. I quote:
“Currently, the kit allows authors to attach tags to content and readers to specify tags in which they are interested. Using this information for a specified content source, a customized Content Query Web Part shows only the items that match a reader’s respective tags. “

In other words, there seems to be little about taxonomies and tagging from Microsoft. But, are there 3rd party tools available to make up for this weakness? Well, KWizCom recently released a third party product for MOSS 2007, called “SharePoint Tagging Feature”:

“KWizCom SharePoint Tagging Feature enables the tagging of SharePoint content such as documents, list items, pictures, forms etc. Furthermore, with the included Tag Cloud Web Part, SharePoint Tagging Feature enables many new capabilities such as presentation of items according to tags, tagging e-mail alerts, and more..”.

To add support for taxonomies, there is “RAPID for SharePoint” from Artemis Corporation:
“The RAPID 'Taxonomy' Classification framework allows the application of any taxonomy within SharePoint content. An unlimited number of taxonomies can be
created and used within a site to classify documents and list items. Once classified documents and list items can be filtered, indexed and queried using standard WebParts, List Controls and Views. Fully integrated into Microsoft Office SharePoint Server 2007"

It is hard to tell from these descriptions how capable these 3rd party tools are. If anyone has had hands-on experience of these or equivalent tools, please share.

Tuesday, October 16, 2007

Why You Need a Concept Model for Integration

"A human being is part of a whole, called by us the 'Universe,' a part limited in time and space. He experiences himself, his thoughts and feelings, as something separated from the rest - a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest us. Our task must be to free ourselves from this prison by widening our circles of compassion to embrace all living creatures and the whole of nature in its beauty." - Albert Einstein

An Enterprise Content Architecture (ECA) essentially consists of content (elements) and metadata (attributes and relations). By applying the proper types of metadata to describe and organize the content within an enterprise, the ECA makes sure that it can be managed and delivered as needed. So far so good. But, the main challenge when trying to establish an ECA and integrate content from various content sources is to overcome two great barriers related to the metadata that surrounds the content:

  1. The first barrier has to do with the structure of the metadata. If the metadata in one content source is different in structure from the corresponding metadata in another content source, we must make sure that they get the same structure.
  2. The second (and even greater) barrier has to do with the semantics – the meaning – of the metadata. If the metadata in one content source does not mean the same as the corresponding metadata in another content source, then we have a problem that we need to resolve. If we are not aware of sematic differences in metadata between two content sources that are to be integrated, then we might get in serious trouble.

To develop an ECA is obviously a challenging - but not impossible - task given that most enterprises have lots of content sources where both the structure and semantics of the metadata have been defined in isolation from how they have been defined in other content sources. This phenomenon iscommonly refered to as the development of “content silos” or “content islands” and it is a result of IT systems being developed to support only specific functions, parts of processes or – at best – entire processes within an enterprise. That is, without an enterprise perspective where all processes and IT systems are viewed as being part of a whole.

To be able to merge different content architectures, there is really no other way than to go back and first (re-)define the basic concepts used in the enterprise. Only when you have established a (agreed upon) concept model on enterprise level you can start to map the content models in different content sources with eachother. Here is a basic approach to establish the enterprise concept model:

  1. Make sure you have commitment from top management and then create a team with represenatives from all “content islands” (that is, departments and/or processes)
  2. Make an inventory of existing content and which content that is needed but does not exist
  3. Identify which content is valuable to the enterprise and therefore should be classified (and managed) as assets
  4. Work out an enterprise concept model in modeling workshops

The enterprice concept model can then guide the creation of enterprise metadata standards, enterprise taxonomy development and (structural and semantic) harmonization of metadata in different content sources thoughout the enterprise.

Friday, October 12, 2007

The Role of an Enterprise Content Architecture

The role of an Enterprise Content Architecture (ECA) is to structure, describe, organize and harmonize content resources within an enterprise so that they can be managed and delivered as content products to end users according to business needs and requirements.

One of the main drivers for establishing an ECA is to reduce the costs for producing and managing content. Simply put, an ECA will bring valuable content resources to the surface so that they can be accessed and found. The reverse scenario is that you don’t find the content resources you are looking for and need to re-produce them. Furthermore, if a content product needs to be delivered in different ways – such as via different channels that require different format and structure for the content product - the ECA makes sure that the content resources from which the content product is built can be reused.

What is even more important than reducing costs is that the ECA provides the foundation that enables users (humans aswell as machines) to exchange and share information and knowledge. Is does so by semantically integrating content resources of different formats, structure and types which are otherwise living on their own islands (or kept in silos) somewhere in the enterprise. This is also where the ECA meets the Enterprise Information Architecture (EIA).



While the ECA is primarily focused on supporting the effiency of enterprise content management processes, the EIA is primarily focused on supporting the information needs within the enterprise - to provide the right information at the right time to the right user. Is does so by defining, organizing and describing content products in ways that it supports how different users in different usage contexts look for information and how they want / need the information delivered to them. It goes without saying that need to have both an ECA and an EIA and that they need to harmonize while still being allowed to be different.

An Enterprise Content Architecture semantically organizes content resources that may be of different granularity and be more or less structured. The architecture – relationships between content following certain rules – is created with the use of metadata, such as taxonomies. The ECA also addresses how to structure, describe and store content resources for optimal production, management and delivery of content products to content workers and end users in the business.

In an upcoming post I will look at what kind of questions you need to address when defining and designing an Enterprise Content Architecture.

Tuesday, October 9, 2007

ECM Illustrated – Where Supply Meets Demand

Hungry for three-letter abbreviations? Here is an attempt to (re)define what ECM, ECA, EIM and EIA stand for and how they relate to each other.



Enterprise Content Management (ECM) is a collection of processes (supported with technologies) for managing any kind of digital content throughout its entire life cycle within the context of an enterprise.

The Enterprise Content Architecture (ECA) semantically organizes the content resources within an enterprise with the aim to support efficient and secure managament of digital content throughout its lifecycle. The ECA provides the structures needed for efficient ECM processes.

Enterprise Information Management (EIM) is a collection of processes for identifying what information the organization needs to function efficiently, and making sure it gets it, i.e. providing the right information to the right user (human or machine) in the right time. To be able to do this, the enterprise need to have efficient ECM processes in place (make sure to clean up at home before you invite any guests).

The Enterprise Information Architecture (EIA) semantically organizes the information resources (content that is intended to inform users) within an enterprise with the aim to support its information needs by providing a structure that connects users (human or machine) with the information they need. The EIA provides the structures needed for efficient EIM processes.

ECM and ECA is working on the supply side (provider) while EIM and EIM is working on the demand side (user).

Friday, October 5, 2007

I call for e-notifications

The number of e-mails in my inbox that are just notifications – short and automatically generated messages that inform me about something – has since long surpassed the number of regular e-mails. Most of them I don’t open and read. The information in the subject is usually enough and then I delete them. Some of them I have to open and read, just to view and click on a link to another location.

One of the “truths” in content management is that all content needs to be managed in some way or another. Depending on the purpose and intended (and actual) use of the content, it might need to have a different structure and format and be managed in a different way than other types of content…that is why we define and work with “content types”.

What strikes me every so often when I delete notifications received by e-mail is that I would really like to manage these in a different way than how I manage my other e-mails. For example, I don’t want them to be mixed up with other e-mails. I would like to view and manage them separately. I would also like to have them automatically deleted after some period of time. And, maybe it would be an idea to have all the information in the subject so that I don’t have to bother to read the body of message?

Much of what I ask for I can achieve - after some effort - by configuring an e-mail client such as Outlook. But should I really have to do this? Aren’t more people than me interested in viewing, using and managing notifications in another way than regular e-mails? If so, then not just I would welcome an initiative that defined, specified and enforced a new standard for notifications. Maybe there even is one on the way?

Tuesday, August 21, 2007

Back to Basics - Defining Data, Content, Experience, Information And Knowledge

The emerging fields of Content Management, Enterprise 2.0 and others introduce new concepts as well as modifications (new interpretations) of already existing concepts. There are often logical inconsistencies between key concepts such as data, content, information and knowledge, which cause confusion and complicate discussions and analysis. We find that it often helps to go back to basic definitions and to try to sort them out.


Some of the most basic concepts dealt with in our blog are outlined below (Philosophers and epistemologists must excuse our simplified but practical approach).

  • Data: Data is content that has been structured so hard (in order to be stored and accessed in an efficient way) that it does not provide enough context to the user to be usable on its own. It needs to be aggregated, formatted and described to be usable.
  • Content: Content is something that is indented to communicate a message from a sender to one or several receivers e.g. a diagram, a document or a digital asset such as picture or movie. The purpose of the message (e.g. the communication process) can be to inform the receiver about something or to create an experience. Digitized content is formatted and described in a way that it can easily be managed and delivered to the user with information technology.
  • Experience: The receiver (user) always gets some kind of experience when he/she interacts with digital content via some kind of device and software user interface. The sender might see the experience as a means to communicate the message to the user more efficiently, or the experience might be the actual message.
  • Information: When perceiving and interpreting content that is intended to inform the user about something, the user will hopefully understand the message. In other words, the content is transformed into meaningful information by cognitive processes in the user's head.
  • Knowledge: When the user reflects and applies the information, it can be transformed into knowledge.

The definitions above are intended to show that there is one thing to manage data and content, and another thing to manage information and knowledge. The key point is that data and content can be managed with the means of (information) technology, but that we cannot manage information and knowledge with technology alone since information and knowledge are created and exist only in the heads of humans.

We can try to conceptualize knowledge into information and capture it as digital content and then deliver it to the audience, but we cannot guarantee that the audience will understand it as we intended.

There are many discussions to be made around the mentioned definitions. What for example are your view on the commonly used term - information worker and knowledge worker? ;-)

Thursday, April 26, 2007

Define Before You Integrate

Many organizations believe they have a clear and exact picture of what they mean when they talk about concepts such as product, service and customer. But if you ask around how people in the organization define "product" you will probably get almost as many definitions as the number of persons you ask. Why? Because the seamingly obvious often obscures misunderstandings and inconsistencies. Not many persons dare to question if, for example, a global Telecom company actually has a clear definition of what a product or service means to them. So the concept is left to be defined by anyone who needs it to be defined, without establishing a common definition that is communicated to everyone within the organization.

Many people would get offended if you start asking questions like this. So, is it a stupid question? Possibly. And possibly not. If they can give you a clear definition and show that everybody uses this definition, then it is fine. If not, you have a (political) problem to deal with.

Semantic integration is what integration is about first and foremost, with technical integration coming in on second place. You need first to ensure that "customer" in one business system means the same thing as "customer" in another business system if you are to integrate them, not just make sure that you map attributes in the correct way. And to be able to ensure that, you need a common definition on enterprise level of the customer concept (which might not be exactly the same as in any of the systems to be integrated).

Thursday, April 19, 2007

BI + ECM = True?

Rich Cohen writes in DM Review:

"The next few years are going to provide a bumpy ride for the IT function at many companies. The rise of unstructured content will require new ways of thinking about how information is used and managed enterprise-wide - indeed it will require a new definition of the term "information." The goal is simple: treat all information - unstructured and structured - as if it is one of the most valuable assets your company possesses. Learn all you can about it, manage it properly, and use it to help grow the company. "

Jacques Surveyer argues why ECM will become a necessity for BI:

"Enterprise content management (ECM) has always been a bit of a wallflower -- acknowledged as important, but emerging slowly and flourishing most in professional and service-oriented firms where collaboration and knowledge management are paramount. But ECM offers a lot that's beyond the capabilities of most business intelligence systems -- the ability to handle semi-structured data in diverse document formats, team collaboration, support for ad hoc working groups, and knowledge management. All are of growing interest in the world of BI."

TDWI uses a similar reasoning when promoting their 2007 TDWI World Conference:

"Business intelligence is a major beneficiary of ECM. As the volume of information in data warehouses and BI systems continues to grow, it becomes increasingly difficult for users to find relevant information and make the right decisions. Traditional BI and analytics are good at telling you the “what” of business performance, but they often leave out the “why.” ECM search capabilities fill in the blanks by making related information in memos, e-mails, and policies findable, reaching further into the heart of a business. "

Dave Kellog gets to wrap up this post:

"So ECM seems to be heading in a new direction. If things evolve similarly in BI, you can expect to see cannabalism (such as Business Objects buying Hyperion) and incursion (such as IBM buying Cognos) in the future. People have speculated about such things for years. If BI continues to evolve in parallel fashion to ECM, then perhaps soon the speculators will be proven right."

Friday, April 13, 2007

Virtual Content Repositories as Content Integration Approach

In essence, content integration is about providing users within an enterprise with a single point of access to all content they need in their daily work.

There are basically two approaches to content integration – consolidation or federation (point to point integration is basically a swearword today). The consolidation approach, to consolidate all content sources into one single enterprise content repository, must however seem like a utopian dream to most large enterprises. Instead, a common way to provide unified access to heterogeneous content in disperse content sources is to implement an enterprise portal solution with content integration taking place at the presentation level. But this integration approach is in my eyes not "real" content integration since it does not offer the possibility to describe, access and search the content in a unified way. Instead, using a federation approach with a virtual content repository could provide these possibilities “behind the scenes” of the portal solution, with the action taking place in the middleware. The promise of virtual content repositories is just to do that - to provide unified access to disparate content sources within an enterprise without any point-to-point integrations or repository consolidation. Especially the possibility to map metadata between different content sources is essential for content integration. Just as content, metadata is usually stored and managed in isolated islands. As Bruce Silver writes “the user needs to be presented with a unified list of attributes independent of the attribute structure of the underlying systems.”

The major benefits with a virtual content repository approach would be that it is relatively cheap and fast compared to consolidation, and that it will still integrate content so that it can be described, accessed and searched in a unified way. In addition "…virtual repositories can simplify the task of compliance by virtue of containing a single set of business processes applicable to all content in all repositories…//...virtual repositories mean organizations can stop debating whether to go with a single or multiple data stores, and instead concentrate on the critical factors that make for a good repository of any size" (R Dukart)

Major ECM / ECI vendors such as Oracle, IBM, EMC and BEA seem to believe in virtual content repositories for the federation approach, with content being federated to a single virtual repository from any existing content source via a standardized API. Obviously, the key for virtual content repositories to succeed is the use of standardised API:s to access the repository and underlying content sources. JSR 170, the Content Repository API for Java Technology specification developed by Day Software, was the first adopted content repository API standard. The goal with the standard was to “produce a content repository API that provides an implementation independent way to access content bi-directionally on a granular level.” (Day Software). Hence, repositories supporting this standard can be accessed in the same way and the repositories are not tied to any one application. The latest version of the JSR 170 standard, JSR 283, was released in October 2005 by Day Software which leads the specification (and also formed a strategic technology partnership with Oracle in November 2006).

Although still being in an early adoption phase, I believe that virtual content repositories have a future as a content integration approach. Maybe not for "deeper" integration of data in relational databases, but certainly for integrating content such as Office documents, web pages, graphics and e-mails.

Wednesday, March 21, 2007

The Content Services Landscape

One of the main reasons that many enterprises need a strategic approach to enterprise content management is because of the diversity and proliferation of content services. Today it is relevant to talk about a landscape of content services that needs to be governed by the enterprise.

What is included in such a landscape? A high-level representation may consist of:

  • Interaction services: Creation, delivery and access services and also business specific content solutions
  • Collaboration services: Includes ad-hoc teaming to structured workflow and process orchestration
  • Management services: Core services for managing webs, documents, records, digital assets, e-mail etc
  • Integration services: Support for different back-end applications, repositories, archives etc

A high-level picture of the landscape and its services can be used as a communication tool to:

  • Clarify the purpose of the services and their relationships
  • Present service management responsibilities
  • Portray the flow and processing of content
  • Illustrate how the services support business processes
  • Highlight strategic choices and the evolution of the landscape
  • Position providers and platforms
  • Etc etc

The landscape as described above does not claim to be the final view of enterprise content management services but has in many cases proven to be a good starting point. It should always be adjusted and detailed to suit different needs and situations.

Thursday, March 8, 2007

The Power Of The Enterprise Taxonomy - Part II

With an enterprise taxonomy, the organization gets a tool for increasing the findability of its content through unified access and improved searching and browsing. It also simplifies integration, maintenance, reuse, translation, exchange and syndication of content. So, the business benefits of an enterprise taxonomy should be pretty clear. But how do you actually develop and implement it?

The main challenge in development and implementation of an enterprise taxonomy is of course political. The different units within the organization will need to cooperate and agree upon a common taxonomy and vocabulary. This is by far the trickiest part of the taxonomy development and implementation process, and it should not be underestimated. Otherwise, the process is pretty straightforward. I have made a try to sketch it out below:

1. Define & Research

Developing an enterprise taxonomy should of course start with determining the objectives, scope and requirements for the enterprise taxonomy. The scope should be easy to define, since it should be an enterprise taxonomy. But for what is it needed? How will it be used? How will it be maintained, by whom, how often and with what resources? How will it need to scale? And so on.

You should also perform a content inventory – produce a complete list of all the content that currently exists in the content landscape. To be able to do that you need to go to those who are developing and maintaining the content. What content do they develop or maintain? What is it about? Where is it located? Who needs it? What do they need it for? Once you have the list of all content with questions like these answered, you can start analyzing the content and design the taxonomy.

2. Analyze & Design

Analysis means trying to understand the semantic relationships and patterns between existing content. As in software development, analysis and design are two intertwined activities, two sides of the same coin. You really cannot do one without the other. So, when you start analyzing you will also start designing the taxonomy.

However, it is important to select an architecture that is suited to its purpose and that is scalable, i.e. can accommodate new content. As in software development, establishing the type of architecture should be done as early as possible. Otherwise, there will be problems later. A taxonomy is often envisioned as a hierarchic tree structure, but it doesn't need to be. A taxonomy could also have a flat, network, or faceted architecture. It can also be a mix of two or more of these architectural types.

3. Validate

Now it is time to test and evaluate the taxonomy with the appropriate validation techniques. You can use qualitative validation, quantitative validation, or a combination. In any case, by this stage you should have a first version of the taxonomy so that you can test it on users and stakeholders.

4. Deploy & Implement

Deploying and implementing the taxonomy can be expressed with simpler words - making it ready to use and putting it into use. Deploying the taxonomy means that it is available and can be used by content management systems, search engines, and so on. Implementing a taxonomy means actually attaching its attributes to the existing and new content. The taxonomy attributes can be terms from a controlled vocabulary, a list of standardized terms that describe concepts within the domain. Using a controlled vocabulary with agreed upon and carefully defined terms ensures consistency in content metadata and also sets a common language for the organization that reduces the potential for misinterpretation. So, using a controlled vocabulary for the taxonomy has its clear benefits.

Again, as in software development, there is no such thing as a big bang approach to implementation. Instead, start with a pilot and then start implementing the taxonomy throughout the organization according to a realistic roadmap.

5. Evaluate & Revise

Once the enterprise taxonomy is deployed, you need to maintain it. Two important activities in the maintenance are to evaluate how the taxonomy is performing and revise it when needed.

But how to govern and maintain an enterprise taxonomy is basically a subject of its own, which might also be the subject for a post or two later on.

Tuesday, March 6, 2007

The Power Of The Enterprise Taxonomy - Part I

Large organizations are always grouped into smaller units, sub organizations which inevitably develop their own vocabularies and their own conceptual model of the enterprise. This creates barriers to communication, collaboration and knowledge exchange. Each unit sooner or later creates its own content silo.

An enterprise taxonomy – a system for naming and organizing the content into groups that share similar characteristics – can help to tear down these barriers. It can facilitate access, exploration and understanding of the digital content that exists within an organization. The taxonomy can make it possible to understand the organization at-a-glance by providing a high-level view of the organization as a whole.

As previously argued by Henrik, one must make a distinction between content architecture and information architecture, between the enterprise taxonomy and navigation taxonomies.

In Information Architecture, taxonomies are developed to facilitate search, navigation and presentation. Search engines look for keywords or words in content sources that match a search query, but people actually look for and explore concepts. This is where taxonomies come into the picture in Information Architecture. Every navigation scheme is based on one or several taxonomies. The dilemma is that there is not one single taxonomy that will organize content the way as all users expects it to be organized. So a single taxonomy will in most cases not be sufficient for creating a successful Information Architecture.

However, to develop a single enterprise taxonomy is fundamental in Content Architecture. Managing content begins with organizing it, and the enterprise taxonomy is a key organization tool. Once the content is organized in a consistent manner, any type of content from any content source can (theoretically) be integrated and made accessible throughout the organization. To be usable, the enterprise taxonomy can not be invented. Instead, it must be derived from the content that exists already in the organization. So before it can be developed, a content inventory has to be performed. Such an inventory will also create a better overview and understanding of the organization and each unit’s content.

My coming posts will tell you more about how to perform a content inventory and develop an enterprise taxonomy.

Sunday, February 25, 2007

Information and Content Architecture

Generally, people use the terms “information architecture” and “content architecture” interchangeably. I would say they represent different perspectives on content management.

Professionals working with presentation of content more often use the term information architecture. Professionals working with production of content often use the term content architecture.

Information architecture professionals organize information and navigation mechanisms for document, web sites and other content rich products so that users may find and use information effectively.

The Information architecture work can include:

  • analyze the user experience and requirements
  • organize the information and features into a logical structure of the document, website etc
  • define page structures for individual pages or templates
  • develop search navigation mechanisms to facilitate users' access to information and functionality
  • specify guidelines and standards for information usage

Content architecture professionals organize content and storage mechanisms for documents, web sites and other content rich products so that content creators, editors and managers can produce and deliver content effectively.

The Content architecture work can include:

  • establish an inventory of content repositories and sources
  • describe the content and relationships in a taxonomy or schema
  • identify production processes and content flows
  • explore possibilities to re-use and re-purpose content
  • specify guidelines and standards for content production

In accordance to the above, information architecture and content architecture can be seen as the two sides of the same coin.