Recently, Hans Michiels wrote a nice article on how to use temporal tables, which are introduced in SQL Server 2016, in a data warehouse environment. Really recommend you to go through it before. In his case temporal tables were performing better than solutions done in a “traditional way”.
Knowing that temporal tables in SQL Server are just normal tables and there is no “magic” going on underneath, I was curious to know why they were performing so much better.
Before starting to talk about Data Modeling in a Big Data World let’s talk about Data Modeling in a Data World. Often different people talk about Data Modeling and yet they talk about different things.
To some people Data Modeling is the set of tables, columns, Primary keys and other objects that have been implemented in a database (Physical Model). Others see it as a set of Entities, attributes, relationships, etc (ER Modeling) for some it is a set of facts (Fact Oriented modeling) and so on.
In this exercise we are going to evolve our data warehouse from one model to the next based on some simple requirement changes.
To make things more concrete we are going to use the very simple model below.
Our simple model keeps information about products and their categorization. Every category belongs to a cluster. Because we do not keep more information than the cluster’s name the clusters is modeled as a column on the category table.
Here we are skipping a lot of details on implementation from all the methods and focus more on the big differences
In a 3NF model tables have references to each other, which depending on the model, might go very deep. Enforcing referential integrity means that the tables must be loaded in a certen order; in fact, that is exactly why we enforce referential integrity. In an OLTP system data comes incrementally and in a certain order that is designed for specific business processes. For instance a product should get registered in the warehouse before it can be sold. In other words the product must first exists in the system and then it can be sold.
While the order of inserting data, in an OLTP system, makes sure the user does not make undesirable insertions/transactions, it might be an unnecessary barrier in a DWH where the data is usually loaded in bulk by an ETL tool. In doing so the margin of mistake becomes almost nonexistent in comparison with the OLTP systems, in which the user is responsible to insert the data.
There are already a couple of data warehouse generation tools in the market which make an amazing job in simplifying your work in building and maintaining a data warehouse. If you ever wondered how these tools work, than this post is for you. Here we are going to explain how a data warehouse generator works. The algorithm described in this post is based on the algorithm implemented by open source data warehouse generator Quipu but it does not describe a one to one copy of the algorithm implemented on the tool. For example, this algorithm does not deal with reference tables and does not have any option to group relationships in a link.