03 Classifying warehouses – first step in data warehouse design
Curious about data warehouses? You've reached the right place.
In this article series, we'll show you:
- What data warehouses are
- Who needs a data warehouse
- Tips about implementing and designing data warehouses
- How to make sure that your data warehouse is safe
Read to explore the world of data warehousing? Let's dive right into the topic.
What are data warehouses?
The concept of a data warehouse (DW) is one of the oldest ideas for storing collected data which dates back to the 1980s when two IBM researchers built the first “business data warehouse.”
“Data warehouse” is a rather general concept; that way storing data can be used in different places depending on specific needs. However, when we think about data warehouses, we often associate them with large companies and corporations – hence the term “Enterprise Data Warehouse” (EDW).
In the simplest terms, a data warehouse is a centralized database of integrated data, most often used as the central element of Business Intelligence solutions. To learn more about what makes BI solutions effective, have a look at our article: Why is clean data so important? The data in a data warehouse is usually unified to a standard form and updated as regularly as possible. A data warehouse also stores historical data – that's something worth remembering when designing a data warehouse.
Why do organizations implement such an architecture in data warehouses? The reason is simple: they rely on information coming from similarly-formatted, current and/or historical data through many different tools. Creating a data warehouse is a smart move because it allows companies to gain a source of the most reliable and current information.
Who needs a data warehouse?
Data warehouses can be used in many different ways. But these are the people who are particularly appreciative of this solution:
- Decision-makers who need data for analysis that helps in planning and decision-making.
- People involved in the optimization of the production process of tangible or intangible goods.
- Accountants and other finance professionals.
- Logistics professionals and people responsible for inventory.
- Staff involved in the monitoring of extensive systems, as well as reporting on and reacting to real-time events.
- Those involved in the field of data science.
- Professionals who have to deal with a lot of data from many different sources.
Before we start to build a data warehouse, it's worth asking: Do we really need it?
Storing and managing large data sets can be challenging and maintaining data to match the desired quality is a process that requires continuous effort. When implementing a data warehouse, we should always take costs into account – they may sometimes exceed the benefits of such a solution.
Classifying warehouses – first step in data warehouse design
When designing our data warehouse, we should think about what for and how the data located there will be used. On that basis, we can distinguish two (or rather, three) types of data warehouses:
Enterprise Data Warehouse (EDW)
EDW is a centralized collection of unified, current and often historical data of the enterprise. Its primary role is supporting business decision-making processes. The ETL process (or its newer ELT paradigm) is used to build this type of data warehouse. Read one of my previous articles to learn more about the ETL process. It's also worth noting here that even if the data is current, it's not real-time – for that, we need the second type of data warehouse.
Operational Data Store
This type of data warehouse provides data in real time. Its most well-known application is reporting systems, but it can be used for many different things that require data refreshed in real time.
A Data Mart is usually a subset of data obtained from a central data warehouse that gathers targeted information, e.g., sales, employee information, or financial. These types of data sets can create their mini data warehouses and get information from external sources, but they usually rely on information taken from their own databases and then transformed accordingly, depending on what they'll be used for.
To help you understand this division of data warehouses, here's an example:
Example data warehousing scenario
Let's imagine that we're an owner of a large chain of grocery stores. To know what's happening in our company, we decide to build a centralized Enterprise Data Warehouse (EDW) that will be used by many different departments at our organization.
For this purpose, we decide to implement a system based on the ETL process that will download information from each store, process it, and then save it to our central database. To simplify that already complicated system, we add an Operational Data Store that contains data that have been unified, but not processed yet.
It's worth noting that in the ETL process, the most time-consuming stage is the processing of extracted data. Downloading and unification of data can be performed in real time – and then used in systems reporting on current status and alarming us when there's an urgent need for interference.
To use data contained in the centralized database efficiently, each department, person or service can create its own small data warehouse (Data Mart) storing data from our central database and external sources.
Want to learn more about data warehouses? Stay tuned for the next part of this article series where we continue talking about data warehouse implementations.
If you need additional information about data warehouses, check our other articles from the "What Are Data Warehouses & How To Classify them" series:
- Data warehouses - what they are and how to classify them (Part 1)
- Data warehouses - what they are and how to classify them (Part 2)
- Data warehouses - what they are and how to classify them (Part 3)