Archive for the ‘Information Management’ Category
Thursday, July 9th, 2009
Tim over at Storage Monkeys wanted to know, “Is de-duplication a strategy or a finger in the dike?”
I wrote:
“In response to the title of Tim’s post, and [as he requested] in the context of backups alone, the concept of de-duplication is an extremely important consideration for any data protection strategy. We can all agree that we’d like to store as little as possible, preferably in the least amount of space, and still meet or beat our day-to-day operational requirements. De-duplication is all about keeping the physical amount of stored data to a minimum. And, faced with a future filled with mind-boggling amounts of new data, de-duplication is a good thing.
What is important to understand is that de-duplication (which appears to be a term born in the storage industry in the past decade) goes by many names and is best visualized as a spectrum of solutions designed to take the redundancy out of data. At one end of the spectrum we find file formats such as JPG, GIF, MP3, MPG, GZIP, TAR and SIT. These are examples of intra-file data reduction (a.k.a. file compression).
Further along the spectrum we find single instance storage, a method of inter-file data reduction that has existed in many business applications since at least the early-to-mid 90s, possibly earlier. It’s a simple implementation that identifies whole [byte for byte] duplicate files and stores a single copy. A lightweight system of pointers or stubs ensures that applications are unaware of the underlying data reduction.
As we continue to move along the spectrum we encounter even more efficient methods of data reduction such as data chunking (at the block or sub-file level) and delta encoding. And storage vendors have, in recent years, added a new wrinkle to de-duplication: timing. Should we de-duplicate before or after moving our data over the network from point A to point B?
Commercial implementations of de-duplication typically combine multiple methods, and all of them make trade-offs between complexity, efficiency and performance. There is no single universally superior method or commercial implementation of de-duplication. You guessed it - it all depends on what you’re trying to accomplish.
And, it really doesn’t matter whether we’re talking about primary or secondary storage, old backup technology or new, near-line, off-line, local, remote, backup or archival storage. They can all benefit from de-duplication whether it’s embedded or bolted-on.
De-duplication isn’t a patch, it’s an integral part of efficient information and storage management.”
Do you have an opinion about the role of de-duplication in your organization? Join the conversation over at Storage Monkeys.
Posted in De-duplication, Information Management |
Wednesday, November 12th, 2008
More than four years have elapsed since I first wrote about what I called the Structured-Unstructured Information Continuum, and the Data-Information-Knowledge Continuum. Both articles described the nature of information and how humans consume it.
This is a long awaited discussion about the value of information. I shall cover the key points here and our readers are more than welcome to contact me to discuss the topic further.
The value of information is, and always will be, determined by outcomes. Information’s value is not intrinsic as some pundits seem to believe. Rather, its value is extrinsic and derived from its application, or misapplication as the case may be.
(more…)
Posted in Information Management |
Monday, March 22nd, 2004
I’m not going to bore you with another introduction to Information Lifecycle Management. Odds are you’ve already read one of the dozens of articles that have appeared in major storage industry publications.
While the term Information Lifecycle Management is relatively new, many of the underlying concepts have been with us for quite some time. Most notable is the idea of managing information from inception to deletion, an idea central to the ILM philosophy, which has existed as part of the document/content/digital asset management1 mantra for well over ten years. (more…)
Posted in Information Management, Storage |
Sunday, March 7th, 2004
If you’ve fallen into the trap of thinking of databases as “structured” information, and files as “unstructured” information, you’re not alone.
That misleading binary categorization—structured or unstructured information—can be attributed to hundreds of presentations, research reports, and magazine articles prepared by people who simply do not understand the complexity of today’s information assets. (more…)
Posted in Information Management, Storage |
Monday, February 23rd, 2004
Client Services Director & Partner
Just when you thought you’ve heard every CxO title imaginable along comes another one - the Chief Preservation Officer (CPO).
Actually the position isn’t new - CPOs currently direct the preservation efforts of a small number libraries, film studios, government agencies, and academic institutions around the world. But that’s about to change. (more…)
Posted in Information Management |
Monday, February 2nd, 2004
Tired of reading the same data retention, and data backup/recovery articles rehashed seven ways from Sunday?
Good. So am I.
If I see one more tape- versus disk-based backup article I’m going to scream.
Instead, I’m going to discuss an issue that seldom, if ever, makes its way to the front page of today’s industry publications. Don’t let the lack of media and analyst attention fool you - it’s the single most important long-term data retention and recovery issue you’ll ever face. I’ll cover it briefly here, and expand the discussion in a coming research report.
The issue is the forward compatibility of data AND data structures. Few viable solutions exist today. But there’s still time to do something about it. (more…)
Posted in Information Management |
Thursday, January 1st, 2004
[letter to the editor, InfoStor, February 2004]
When I picked up the December issue of InfoStor, I noticed the front-page article titled “New Approaches to Managing Reference Information” - also available online. I’ve seen the term “reference information” floating around the storage industry for about a year now, so the title immediately caught my attention. Now that I’ve read the article, I have a few comments and clarifications.
The “reference information” issues raised in this article and elsewhere – e.g., searching, indexing, and long-term retention — may seem new to some people, but to anyone familiar with content management, these are issues that the content management industry began to address years ago (that is to say the *boom* began years ago). After all, “reference information” is nothing more than content. (more…)
Posted in Information Management |
|