I recently purchased and read the book Deciphering Data Architectures – Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh by James Serra. James – who works for Microsoft since quite some time now – has an interesting blog on data architecture and data warehousing, and I was really looking forward to read this book once it was announced. I have to tell you, it did not disappoint.
The power of this book is that it is quite short. At 242 pages, it is definitely not the biggest book in my library. But James succeeded in giving a decent overview of relevant data architectures, and explaining them in just enough detail so you can understand what their purpose are, what there advantages and disadvantages are, and when to use them. Of course, you won’t get the same detail as you would get in other data architecture books, for example the Data Warehouse Toolkit by Ralph Kimball, but as stated before, that’s not the point of the book. He does go in great length though to say data mesh isn’t probably useful for most companies 🙂
In my opinion, everyone working with data should read this book. Especially if you’re working with analytical data (which is the main focus), but also when you’re for example a DBA supporting an operational system. It will broaden your perspective and you’ll learn what all those buzzwords are that the business (or the consultants) are throwing around. Even non-technical people can read this book to familiarize themselves with the concepts. The book doesn’t go into technology (aside from a few sections in the last chapter that talk about Hadoop, Snowflake and Databricks). Even though James works for Microsoft, it’s not a book praising the Azure ecosystem or something like that.
The book does a great job at describing the different data architectures, but it also talks about data modelling (e.g. normalization, denormalization etc.), people & processes and it has a great chapter on architecture design sessions. I don’t agree 100% with everything (and James explicitly states this is OK, nothing is set in stone, discussion is always possible). For example, James states Inmon is more commonly used than Kimball and I tend to disagree. This might be a geographical thing or something more anecdotal, but I’ve seen way more Kimball-style data warehouse implementations than Inmon. There’s only one error (that I’m aware of) and that is in the description of a type 3 slowly-changing dimension. James says this is a dimension where change is kept for every attribute. That is still a type 2 dimension though. A type 3 dimension will add a new column when a change is detected. For example, a column storing the current e-mail address, and a column for the previous e-mail address.
The only disadvantage of this book is – in my opinion – the price. It has a list price of $79.99, which seems a bit excessive for 242 pages of content. The figures aren’t even printed in color. This reminds me of the data vault book which also had a ridiculously high price (but printed in color though). However, don’t let this stop you from buying this book, it is definitely an asset and you might get it at a discount anyways.
Conclusion: read this book. It will make you a better data practitioner.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
Hi Koen,
Thanks so much for the review! It is greatly appreciated. Glad you enjoyed the book and I appreciate your honesty. Just a few things of note:
- My experience is most companies have a central physical DW (Inmon) instead of a logical DW (Kimball). Hence my comment about seeing more Inmon. However, most also have star schema's (Kimball). And most are really a combination of Inmon and Kimball, so hard to really know the truth about which is technically the most popular
- I agree with your definition of SCD type 3 and I described in my book that you would create a new record that contains new fields (meaning old and new columns), but I see where there is confusion, and I will clear that up with an example in next edition :-)
- I wish I had control of the list price, but it's all up to the publisher, and you would be shocked at how little authors get per book. Fortunately most sites such as Amazon have it at $51 for the paperback and $48 for the Kindle. Then there are sites that have specials, such as Humble https://www.humblebundle.com/books/pipelines-and-nosql-oreilly-books that has the Kindle version for $25 along with 13 other books. And Amazon just started a sale today where you "Get 3 for the price of 2" at https://www.amazon.com/promotion/psp/A2B6T4FK4NAK8A?ref=psp_pc_cart_collapse&redirectAsin=1098150767&redirectMerchantId=ATVPDKIKX0DER
Hope this helps!
-
-
Hi James,
I'm well aware of how little authors make. My comments about the price are directed towards the publisher, not you :)
I haven't seen any logical DW in my career, there're all physical but almost always denormalized into star schemas. I've not encountered many normalized DW. Again, this might be purely anecdotal :)
I really enjoyed the book and I'm recommending it where I can.
Looking forward to the next edition.