Data Mobility Group, LLC - High Definition Analytics and Technology Market Insight

Forward compatibility – the real, long-term data retention and recovery issue

Tired of reading the same data retention, and data backup/recovery articles rehashed seven ways from Sunday?

Good. So am I.

If I see one more tape- versus disk-based backup article I’m going to scream.

Instead, I’m going to discuss an issue that seldom, if ever, makes its way to the front page of today’s industry publications. Don’t let the lack of media and analyst attention fool you – it’s the single most important long-term data retention and recovery issue you’ll ever face. I’ll cover it briefly here, and expand the discussion in a coming research report.

The issue is the forward compatibility of data AND data structures. Few viable solutions exist today. But there’s still time to do something about it.

I’ll begin with a scenario.

Let’s assume you run a hospital. Further, let’s assume you follow HIPAA data retention regulations to the letter. In ten, 20, or even 50 years – assuming your media of choice has withstood the test of time and you’ve successfully restored the data – how do you plan to read the data once you’ve restored it?

Think about that for a moment. Which applications were used to write the original data? Are those applications even around anymore? What about the hardware and OSs the apps ran on?

Unless you plan to store and maintain the hardware and software used to write every piece of content stored by your organization over the next several decades, eventually you’re going to find yourself up a creek without a paddle (probably sooner than you think).

HIPAA is not going to help you address that issue. Neither is SarbOx, or the slew of other regulations flowing out of Washington. Legislators left the dirty details for the rest of us to figure out.

So what are our options? Well I already mentioned one of them, and obviously that’s just not a feasible long-term solution. What are some of the other options?

  • Emulation – not advisable. It’s a band-aid. And more likely than not, you’ll open another can of worms. For one, you’d have to periodically update your emulators as old platforms are phased out. And too, it’s typically a reactive measure, done after the fact.
  • Long-term standardized formats – an improvement over emulation, but limited because the so-called standards aren’t “standard”, and they continue to evolve (though a few have managed to survive for the time being, ASCII comes to mind). Worse, there are thousands of file types and data structures in use today. Many people would suggest XML as a storage format. While certainly more flexible [long-term] than say a .doc file, XML carries its own baggage (i.e., maintaining and version managing the XSLT and Schemas necessary to interpret the files), and XML is primarily intended for text-based data. For now, you can forget about using it for long term audio, video, and raster-based graphic storage – it simply wouldn’t make sense.
  • Periodic conversion – this is the most promising of the options. In this scenario, you would periodically convert older content, or migrate data in older data structures into new formats and structures. Obviously, the period could be as short or as long as you like. In my opinion, a rolling two to five year FIFO conversion makes the most sense. You can spread the cost over time.But therein lies another problem – data integrity. How can you guarantee data has not been altered, or in any way mishandled or misinterpreted during the conversion process? And what about lossiness due to compression or conversion (e.g., with audio, video, and graphics)?

I haven’t even begun to discuss the impact on more complex interdependent data. For example, migrating from one content management system to the next, say every 2 to 5 years, while preserving the data structures, and metadata that link all of your individual content elements together. This is a discussion in itself.

Suffice it to say, the forward compatibility of data and data structures is THE long-term data retention and recovery problem to solve. And I’ll share more of my thoughts on this topic in future research.

Vendors, I hope you’re listening.

[As an aside, I often smile when I listen to conversations about media longevity. Companies that speak of increasing the life expectancy of a particular media (disk, tape, optical, holographic, etc.) from 10 to 20 years, or 30 to 50, or 100 years and beyond. I view it as a huge waste of R&D resources. Given that the content stored on the media will have to be read, converted, and rewritten into a newer format every so many years, and that the technology for a given media will likely have improved severalfold each time, why would one care about the 30-100+ year data retention capabilities of the media? Perhaps these vendors can give us a hand with the forward compatibility problem instead?]

This post was originally published in Data Mobility Group’s first blog, “Perspectives on Storage”, on February 2nd, 2004.

Leave a Reply

You must be logged in to post a comment.

  © 2002-2009 Data Mobility Group, LLC. All Rights Reserved. terms of use privacy copyrights