I recently read the book Agile Data Warehouse Design – Collaborative Dimensional Modeling, from Whiteboard to Star Schema (quite the title) by Lawrence Corr and Jim Stagnitto. The book was recommended to my years ago by a former colleague (the book is from 2011, with the latest revision from 2014) and it has been sitting on my “to-read” for all this time. Quite frankly, I had forgotten about the book, but at a precon workshop Johnny Winter mentioned the book so I decided to go read it.
TL;DR: my only regret of when reading this book, was that I haven’t read it sooner.
The book is amazing. It’s very clearly written, and it’s goal is to help you become a better data warehouse modeler, in the context of designing a DWH using an agile methodology. The book has planted itself firmly in my top 5 of best technology books (alongside The Data Warehouse Toolkit and Star Schema The Complete Reference).
The book is divided in two parts. The first part talks about modelstorming, a brainstorming technique for data warehouse modeling that is introduced in this book, alongside the BEAM* framework. BEAM stands for Business Event Analysis & Modeling. Using 7 questions (who, what, where, how many, who and how > the 7w), you work together with IT and business stakeholders to define, analyze, model and document business processes. The result is a BEAM table (which kind of looks like a fact or dimension table but without surrogate keys). Each row contains an example of how an event (for example, a customer buys a product) looks like. Because you use examples, the data will become more clear for the business stakeholders. The BEAM table can be used as documentation, and it serves as a good foundation for designing the logical and physical layers of your data warehouse. This technique is well-suited to be used in an agile context, as you can model one type of event (which probably corresponds with one star schema) at a time and you can do this in a sprint for example.
In short, the first part of the book will help you to do better and more efficient “requirements gathering & design workshops” with your stakeholders.
The second part talks about dimensional design patters, and dives deeper on how you can design your dimensions and fact tables better (roughly one or two questions from the 7w corresponds with one chapter). Some of the content can probably found in the Kimball books as well, but it was great to have a refresher. I definitely picked up a few nice design techniques, such as the hierarchy map pattern for dealing with parent-child relationships. Quite some exotic use cases are dealt with in this part of the book, and it will make you a better data warehouse developer when you’ve finished the book.
In conclusion: if you want to be a better, more agile data warehouse developer, than I absolute recommend this book. I recommend though that you already have a couple of years behind your belt, it will make you appreciate the book better.
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
I'm hosting a free webinar at MSSQLTips.com at the 19th of December 2024, 6PM UTC.…