Data Mobility Group, LLC - High Definition Analytics and Technology Market Insight

De-duplication as part of efficient information and storage management

Tim over at Storage Monkeys wanted to know, “Is de-duplication a strategy or a finger in the dike?

I wrote:

“In response to the title of Tim’s post, and [as he requested] in the context of backups alone, the concept of de-duplication is an extremely important consideration for any data protection strategy. We can all agree that we’d like to store as little as possible, preferably in the least amount of space, and still meet or beat our day-to-day operational requirements. De-duplication is all about keeping the physical amount of stored data to a minimum. And, faced with a future filled with mind-boggling amounts of new data, de-duplication is a good thing.

What is important to understand is that de-duplication (which appears to be a term born in the storage industry in the past decade) goes by many names and is best visualized as a spectrum of solutions designed to take the redundancy out of data. At one end of the spectrum we find file formats such as JPG, GIF, MP3, MPG, GZIP, TAR and SIT. These are examples of intra-file data reduction (a.k.a. file compression).

Further along the spectrum we find single instance storage, a method of inter-file data reduction that has existed in many business applications since at least the early-to-mid 90s, possibly earlier. It’s a simple implementation that identifies whole [byte for byte] duplicate files and stores a single copy. A lightweight system of pointers or stubs ensures that applications are unaware of the underlying data reduction.

As we continue to move along the spectrum we encounter even more efficient methods of data reduction such as data chunking (at the block or sub-file level) and delta encoding. And storage vendors have, in recent years, added a new wrinkle to de-duplication: timing. Should we de-duplicate before or after moving our data over the network from point A to point B?

Commercial implementations of de-duplication typically combine multiple methods, and all of them make trade-offs between complexity, efficiency and performance. There is no single universally superior method or commercial implementation of de-duplication. You guessed it – it all depends on what you’re trying to accomplish.

And, it really doesn’t matter whether we’re talking about primary or secondary storage, old backup technology or new, near-line, off-line, local, remote, backup or archival storage. They can all benefit from de-duplication whether it’s embedded or bolted-on.

De-duplication isn’t a patch, it’s an integral part of efficient information and storage management.”

Do you have an opinion about the role of de-duplication in your organization? Join the conversation over at Storage Monkeys.

Leave a Reply

You must be logged in to post a comment.

  © 2002-2009 Data Mobility Group, LLC. All Rights Reserved. terms of use privacy copyrights