dots

This is the third and last post in our series about data warehouses. If you missed the previous ones, check them out here:

Data warehouses – what they are and how to classify them (Part 1)

Data warehouses – what they are and how to classify them (Part 2)

 

In our series, we have showed you:

What data warehouses are
Who needs a data warehouse
Tips about implementing and designing data warehouses

But bow to make sure that your data warehouse is safe? Here’s the answer.

 

Security in data warehouses

They say there are two types of people dealing with databases: those who do backups and those who will do them. This well-known phrase is still circulating because it’s actually quite accurate. One of the most significant aspects of data warehouses that often gets overlooked or downplayed is the management of historical data and backups.
In this part of our series about data warehouses, I wanted to share some security essentials.
But before I explain how to take care of security in data warehouses, there’s one thing we need to make clear first: never delete anything permanently. Just don’t. In theory, everyone who has some experience with Big Data knows that, but you may come across cases where reconstructing the current state of the database or analyzing the process of data creation is impossible.

 

Data warehousing disasters

One of the most common scenarios that can lead to a disaster is a loss of some or all of the data. That might happen for many different reasons, from a disk array failure to a poorly designed functionality or an error made by a developer.
It may also happen that data in our warehouse loses coherence. The recovery of data integrity without any historical data, a changelog, or at least a backup will be challenging – in some cases it might not even be possible!
The data we store in our warehouse is going to be constantly updated. In time, we might find ourselves in need of historical data that no longer exists. When creating a data warehouse, remember that the data stored there may be used in the future for other purposes than those we planned initially.
It may happen that we want to generate analytical data showing the changes that took place over several months – if we don’t record information about such changes, this won’t be possible.
Consider the following examples:

We have a website that manages sports bets and we would like to check how users behave in relation to the information published about a team or player.
Another example is where we need to check the details of invoices issued during a given moth or when it turns out that something isn’t right in our quarterly settlement.
It might happen that we need information about users who logged into our system to verify and resolve any ambiguities that may arise because of their activity.

 

So how do we protect our data warehouses?

There are basically two good solutions you can choose from when ensuring the safety of data stored in your data warehouse.

Regular backups (Dumps)

Backup copies are something that should be part of the definition of a data warehouse. Depending on how often our database is updated, we should make automatic backups. It’s common to perform a database dump once a day and kept for several months. But in some situations, we may need to perform it more often and keep backups for a more extended period. There are many tools available for creating and storing database dumps – most of them contain compression features that help to save space on the hard drive.

Changelog

In addition to regular backups, it’s a good idea to keep a record of changes for individual atomic information. That allows making corrections to our system and then reproducing the current state of the database with changes taken into account. The downside of this solution is the need to store quite a large amount of data redundancy. Also, the implementation of such a mechanism is not easy, so before we decide to create a changelog, we should ask ourselves whether we really need it.
I hope this guide to data warehousing helps you to make the right decision regarding the methods for storing data at your organization.

Have you got any questions about data warehouses? Give us a shout out in the comments; we’re looking forward to hearing from you.
Got a data science project on your mind? Hire us to get top expertise in the field.

Sławomir Sawicki

Sławomir Sawicki

Backend Engineer

Sławek is a full stack developer mostly focused on backend side. He's a graduate of West Pomeranian University of Technology in Szczecin which lead him to become a software developer. At Sunscrapers, Sławek is a backend and frontend developer who deliver high-quality code, helps to solve problems and cooperate with clients. After work, Sławek is involved in creating music, aquaristics and recently he has been breeding ants.

dots