Data Mobility Group, LLC - High Definition Analytics and Technology Market Insight

Archive for the ‘Information Management’ Category

De-duplication as part of efficient information and storage management

Thursday, July 9th, 2009

Tim over at Storage Monkeys wanted to know, “Is de-duplication a strategy or a finger in the dike?

I wrote:

“In response to the title of Tim’s post, and [as he requested] in the context of backups alone, the concept of de-duplication is an extremely important consideration for any data protection strategy. We can all agree that we’d like to store as little as possible, preferably in the least amount of space, and still meet or beat our day-to-day operational requirements. De-duplication is all about keeping the physical amount of stored data to a minimum. And, faced with a future filled with mind-boggling amounts of new data, de-duplication is a good thing.

What is important to understand is that de-duplication (which appears to be a term born in the storage industry in the past decade) goes by many names and is best visualized as a spectrum of solutions designed to take the redundancy out of data. At one end of the spectrum we find file formats such as JPG, GIF, MP3, MPG, GZIP, TAR and SIT. These are examples of intra-file data reduction (a.k.a. file compression).

Further along the spectrum we find single instance storage, a method of inter-file data reduction that has existed in many business applications since at least the early-to-mid 90s, possibly earlier. It’s a simple implementation that identifies whole [byte for byte] duplicate files and stores a single copy. A lightweight system of pointers or stubs ensures that applications are unaware of the underlying data reduction.

As we continue to move along the spectrum we encounter even more efficient methods of data reduction such as data chunking (at the block or sub-file level) and delta encoding. And storage vendors have, in recent years, added a new wrinkle to de-duplication: timing. Should we de-duplicate before or after moving our data over the network from point A to point B?

Commercial implementations of de-duplication typically combine multiple methods, and all of them make trade-offs between complexity, efficiency and performance. There is no single universally superior method or commercial implementation of de-duplication. You guessed it – it all depends on what you’re trying to accomplish.

And, it really doesn’t matter whether we’re talking about primary or secondary storage, old backup technology or new, near-line, off-line, local, remote, backup or archival storage. They can all benefit from de-duplication whether it’s embedded or bolted-on.

De-duplication isn’t a patch, it’s an integral part of efficient information and storage management.”

Do you have an opinion about the role of de-duplication in your organization? Join the conversation over at Storage Monkeys.

Outcomes and the Value of Information

Wednesday, November 12th, 2008

More than four years have elapsed since I first wrote about what I called the Structured-Unstructured Information Continuum, and the Data-Information-Knowledge Continuum. Both articles described the nature of information and how humans consume it.

This is a long awaited discussion about the value of information. I shall cover the key points here and you are more than welcome to contact me to discuss the topic further.

The value of information is—and always will be—determined by outcomes. Information’s value is not intrinsic as some people seem to believe. Rather, its value is extrinsic and derived from its use or misuse as the case may be.


Along the road toward ILM

Monday, March 22nd, 2004

I’m not going to bore you with another introduction to Information Lifecycle Management. Odds are you’ve already read one of the dozens of articles that have appeared in major storage industry publications.

While the term Information Lifecycle Management is relatively new, many of the underlying concepts have been with us for quite some time. Most notable is the idea of managing information from inception to deletion, an idea central to the ILM philosophy, which has existed as part of the document/content/digital asset management1 mantra for well over ten years. (more…)

The Structured-Unstructured Information Continuum

Sunday, March 7th, 2004

If you believe databases contain “structured” information, and files contain “unstructured” information, you’re not alone.

The idea that information is either structured or unstructured can be attributed to countless presentations, research reports, and magazine articles prepared by people who simply do not understand the characteristics and nuance of information in its many forms. (more…)

The rise of the CPO

Monday, February 23rd, 2004

 Client Services Director & Partner

Just when you thought you’ve heard every CxO title imaginable along comes another one – the Chief Preservation Officer (CPO).

Actually the position isn’t new – CPOs currently direct the preservation efforts of a small number libraries, film studios, government agencies, and academic institutions around the world. But that’s about to change. (more…)

Forward compatibility – the real, long-term data retention and recovery issue

Monday, February 2nd, 2004

Tired of reading the same data retention, and data backup/recovery articles rehashed seven ways from Sunday?

Good. So am I.

If I see one more tape- versus disk-based backup article I’m going to scream.

Instead, I’m going to discuss an issue that seldom, if ever, makes its way to the front page of today’s industry publications. Don’t let the lack of media and analyst attention fool you – it’s the single most important long-term data retention and recovery issue you’ll ever face. I’ll cover it briefly here, and expand the discussion in a coming research report.

The issue is the forward compatibility of data AND data structures. Few viable solutions exist today. But there’s still time to do something about it. (more…)

Reference This!

Thursday, January 1st, 2004

[letter to the editor, InfoStor, February 2004]

When I picked up the December issue of InfoStor, I noticed the front-page article titled “New Approaches to Managing Reference Information” – also available online. I’ve seen the term “reference information” floating around the storage industry for about a year now, so the title immediately caught my attention. Now that I’ve read the article, I have a few comments and clarifications.

The “reference information” issues raised in this article and elsewhere – e.g., searching, indexing, and long-term retention — may seem new to some people, but to anyone familiar with content management, these are issues that the content management industry began to address years ago (that is to say the *boom* began years ago). After all, “reference information” is nothing more than content. (more…)

  © 2002-2009 Data Mobility Group, LLC. All Rights Reserved. terms of use privacy copyrights