The bad habit I chose is about the representation of dates in a data warehouse (or any other database). In the course of my career I have seen many formats:
The worst are by far the string representation, as there is no actual check on the contents. It can literally contain everything. And is ’01/02/2018′ the first of February 2018 (like any sane person would read, because days come before months), or the 2nd of January? So if you have to store dates in your data warehouse, avoid strings at all costs. No excuses.
The integer representation – e.g. 20171208 – is really popular. If I recall Kimball correctly, he said it’s the one exception where you can use smart keys, aka surrogate keys that have a meaning embedded into them. I used them for quite some time, but I believe I have found a better alternative: using the actual date data type. Some of the reasons why:
People will resist this change of course, crying resentment using slogans like “but what about the dummies?”. Dummy records are artificial records in a dimensions that hold placeholder values like “Missing”, “Unknown” or “Not Applicable”. Typically they get values like (0,-1,-2) or (-999,-998,-997). Easy to handle with integers. But what about the dummies when using actual dates? Let’s try some DATEADD magic:
Problem solved. You can also use 9999-12-31 as a placeholder for “dates in the future”. Or you can use actual surrogate keys with no meaning, like you do for any other dimension. It makes debugging a bit harder though.
If you build your date table in Tabular, the table needs to have continuous values, meaning no gaps in the sequence, for the dates. So placeholders need to be filtered out, even if you use integers. In that case, just storing NULL for missing dates might be an alternative.
Conclusion: avoid using strings as dates in your database system. I hope with this blog post I have convinced you to use dates as surrogate keys for the date dimension instead of the smart integer.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
I loved your post especially the statement "like any sane person would read, because days come before months"...I never understood the format month/day/year.
Regarding Data Warehouses:
As far as I recall from my Kimball courses, smart integer dates are permissible as a tool for partitioning data. But you shouldn't do any querying and conversion on them and rather join to the dimension table. Putting dates as key has some charm...however in my data warehouse I stick with integer keys. However as I am handling datetimes (with separate dimensions time and date) I carry along the actual datetime values in my fact table for reference.