Yet another book review! I just finished the book Pro Serverless Data Handling with Microsoft Azure: Architecting ETL and Data-Driven Applications in the Cloud (what a title :), written by German MVPs (and metalheads) Benjamin Kettner and Frank Geisler.
The book handles the topic of “serverless” computing in the Microsoft Data Platform, where serverless is any service in Azure that abstracts away the concept of a server. So yes, technically, there are still servers, you just don’t control any of them. Examples of such services are Azure Functions, Azure Logic Apps, Azure Data Factory or Azure Synapse Serverless SQL Pools.
Once I learned this book came out, I purchased it immediately because it’s such an interesting concept. I believe serverless services are the future of the data platform, as they can scale up and down automatically, they provide great resiliency and they make implementation of some architectures easier. It seems the authors feel the same way 🙂
The book is divided into 4 parts. Part 1 (3 chapters) covers the basics: what is Azure, what is serverless, what is ETL, stuff like that. If you’re already done some data warehousing projects or worked with Azure, most of the content here will already be familiar. In the chapter about serverless the authors state that databases typically are not a great fit for serverless patterns, since they have long startup times (which is true in Azure SQL DB serverless, and the first connection always fails which isn’t great either). But I guess the authors haven’t worked with Snowflake yet (is that a true serverless service? Up for debate :), because Snowflake can auto-resume lightning fast. But I suppose Snowflake is not a part of the “Azure serverless” offering so it’s excluded from the book.
Part 2 (6 chapters) introduces all the serverless data services in a bit more detail. They cover Azure Functions, Azure Logic Apps, Azure Data Factory, data storage (Azure Blob/Data Lake & Queue & Table Storage, Azure SQL DB serverless and Azure Cosmos DB), streaming services (IoT Hub, Event Hub, Service Bus and Streaming Analytics) and finally Power BI. Some chapters have great examples and demonstrate the functionality really well, especially the chapter about Azure Functions. Maybe because Azure Functions often play a key role in the discussed architectures later in the book. But for the other services the examples are really short. The book is not meant to teach you all the intricate details of those services, but its purpose is rather to make you acquainted with them. Look at the book as a starting point for your serverless journey.
Part 3 (4 chapters) talks about design practices and it’s in my opinion the most interesting part of the book as it deals with serverless more from an architectural point of view. It has a great chapter about resiliency and then follows up with a chapter about queues and messages/commands to achieve said resiliency. There’s a great example with Azure Functions and Queues there, but that’s where I found the only “negative point” in the book: they refer to an ARM template on their Github to create the resources, but the Github repo is – at the time of writing – empty.
I contacted the authors and they are aware of the issue and they promised they’re going to fix it. But all in all, it’s not that bad, if you take a look at the screenshot of the result and you’ve read the previous part of the book, you can create all the necessary resources yourself. I struggled a bit with the Azure Function in the example (mainly because I don’t write them that often); I had some issues to get the connection string to the queues going. Maybe the book could have given a bit more explanation there. The part ends with a chapter on processing streams and a chapter about monitoring serverless. Those two chapters were not nearly as detailed as the previous two, and they could have used more code examples. A couple of Kusto queries to analyze some log data would have been great, for example.
The final part (4 chapters) brings it all together. The first chapter is about all the tools you can use when dealing with serverless. The next one is about data loading practices (e.g. how to deal with flat files, how to deal with REST, how to deal with databases as a source), while the subsequent one is about data storage patterns (aka the destination). It discusses relational stores, storage accounts and non-relational stores. The final chapter discusses a use case where the authors propose a serverless architecture. The last part is quite theoretical and the chapters are quite short. A bit more detail here and there would have been nice.
I really enjoyed reading the book. It’s not a detailed step-by-step book on how all those serverless services work, but it’s rather a book that will make you think on how you would design and architect your data solutions on Azure using serverless services. For example, the chapter about resiliency and queues made me really think on how I could perhaps redesign an existing architecture with queues and Azure Functions to achieve better performance, higher resiliency and lower cost. But because the book is all about serverless, all of the proposed solutions/architectures are serverless only. Sometimes this feels a little forced (for example, as relational storage choice it’s always Azure SQL DB serverless, while they’re might be better options out there). This is a conscious choice of the authors though and they mention this at the start of the book. But in real life, you probably would choose a dedicated SKU of Azure SQL DB, or you would go for Azure Synapse Analytics (in the book, Azure Synapse Serverless SQL Pools really didn’t get any love. It’s mentioned, and then dismissed as not a great fit for the proposed architecture. I really would have liked more detail on this service). Or you would add Azure Analysis Services for example.
Conclusion
I definitely recommend this book for anyone (BI professional, data architect, data engineer …) who wants to know a bit more about the serverless data offerings in Azure and their design practices.
------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Hi Koen
Kindly suggest a book for same subject (Data Handling) in on-premise for best practices.
what are you looking for? Info about SQL Server? How to write SQL? Or rather ETL development with SSIS?