Big backup? Make up your mind!

Some time ago, we wrote in our blog about precision medicine¸ medication that can be precisely tailored to the individual patient based on big data. We told then that in this case it is good to distinguish between active data and the large bulk of passive data. But this distinction can be applied to many more areas of work. In this blog, we tell you more about it.

To start with just one example: in pathology, to assess and combat cancer, data from X-rays and laboratory results, among others, are also assessed with the support of artificial intelligence (AI). AI can quickly recognize an abnormality and at least signal the physician to look at it with extra interest. Note that this method only works if the artificial intelligence can be learning, and this requires a large amount of data, which must be quickly accessible. This data must be able to be kept for a long time, and must be edited as needed. And if they are indeed edited, they should also be fast. So again, we need to distinguish between quickly accessible active data and more or less resting data that does not need to be super fast.


Compare it to an iceberg. The small piece of the iceberg that sticks out above the water is the primary data. But we can also divide the great bulk of data below sea level into active and resting data. We can also make this subdivision for a variety of large-scale backup environments. If you have more data for backup, you find yourself in the situation where some of this data is very important and needs to be able to be restored quickly when you need it. But much of it you will most likely only need if data needs to be restored in the event of a disaster.

Other situation

All of this is causing us to increasingly move toward a situation where we view our data, backup and its value differently. Previously, a backup was mainly to have a spare if you accidentally lost a file. The case where you had to restore a complete environment or, in the case of a hardware problem, even rebuild your entire server was actually a rare occurrence. But times have changed. We no longer backup to recover certain files, but actually as data protection and to restore infrastructure very quickly. This has been prompted in part by cyber attacks that take down entire servers, something that was a rarity a decade ago. A good portion of backup must be fast, especially in a restore because most businesses and organizations cannot afford to be down for days before hundreds of terabytes of data is back. That should go a la minute. Below that is the larger bulk of older data that must also be backed up and restored if necessary, but with a little less urgency.

Performance versus capacity

And that in turn requires careful consideration of performance versus capacity. Which data do you need super fast in the event of a restore and thus require high and relatively expensive performance? Which dates are subject to different considerations and are more restive? So these tradeoffs come into play not only in world of AI, but actually in every company and organization with a lot of data and considerations about backup. Certainly, you can fully stall your backup on a high-performance system. This is more expensive, and if it is a limited amount of data, it can be considered financially. But often it becomes different dates really take on great proportions. Then the consideration for a backup with high performance versus capacity can be a matter of millions of dollars.

Subscribe for tips and info

We regularly write blogs on current topics from the world of digital storage technology. Sign up here to be notified about new blogs.