Yes, you’re reading that right. A book not about the Microsoft BI stack. At a project, Pentaho Data Integration (PDI) was used as an ETL tool. To get to know this tool a little better, I bought the book Learning Pentaho Data Integration 8 CE (Third Edition) by the author Maria Carina Roldán. For those of you who are not quite familiar with Pentaho (as myself three months back), it’s an open-source suite of BI tools. There’s is also a commercial Enterprise product available. The ETL tool of the suite is Pentaho Data Integration – PDI in short – which is also known in the community as kettle and spoon (yes, they have fun code names. There’s also kitchen etc.). The tool itself is written in Java.
In short, I very much liked the book. It’s to the point, and the author certainly knows her stuff. There’s an abundance of chapters that explain almost everything you need to know about PDI. The book starts easy with the how to install the product, how to do some basic transformations and how to read/write files. Then it’s the more intermediate and advanced stuff is explained, such as how to do data cleansing, validation and how to load databases and data warehouses. At the end, some best practices are shared on how to create reusable transformations and jobs, and how to design and deploy your projects.
The book is well written, but sometimes the explanation is a bit short. However, the book is already 487 pages, so cramming in more might not have been a good idea. Although the book is about Pentaho 8 (and 8.1 is the current edition), some screenshots are already out-of-date. Maybe some left-overs from previous editions of the book? Also, there’s a short section on how to load files to AWS S3 and the authentication process is completely different in the current version. One small annoyance I had was when installing the product. The book literally says: “Download the latest version, extract it, run a batch file to start PDI. That’s it, you’re done.”. Except that on a lot of desktops the JAVA home hasn’t been added to the environment path, so when you run the batch file it can’t find Java.exe and it exits. Just a minor gripe here 🙂
Conclusion: this book is certainly a good choice to get you up to speed with PDI. You can also use it as a reference work later on. Definitely recommended.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…