Book Review – Learning Pentaho Data Integration 8 CE

Yes, you’re reading that right. A book not about the Microsoft BI stack. At a project, Pentaho Data Integration (PDI) was used as an ETL tool. To get to know this tool a little better, I bought the book Learning Pentaho Data Integration 8 CE (Third Edition) by the author Maria Carina Roldán. For those of you who are not quite familiar with Pentaho (as myself three months back), it’s an open-source suite of BI tools. There’s is also a commercial Enterprise product available. The ETL tool of the suite is Pentaho Data Integration – PDI in short – which is also known in the community as kettle and spoon (yes, they have fun code names. There’s also kitchen etc.). The tool itself is written in Java.

In short, I very much liked the book. It’s to the point, and the author certainly knows her stuff. There’s an abundance of chapters that explain almost everything you need to know about PDI. The book starts easy with the how to install the product, how to do some basic transformations and how to read/write files.  Then it’s the more intermediate and advanced stuff is explained, such as how to do data cleansing, validation and how to load databases and data warehouses. At the end, some best practices are shared on how to create reusable transformations and jobs, and how to design and deploy your projects.

The book is well written, but sometimes the explanation is a bit short. However, the book is already 487 pages, so cramming in more might not have been a good idea. Although the book is about Pentaho 8 (and 8.1 is the current edition), some screenshots are already out-of-date. Maybe some left-overs from previous editions of the book? Also, there’s a short section on how to load files to AWS S3 and the authentication process is completely different in the current version. One small annoyance I had was when installing the product. The book literally says: “Download the latest version, extract it, run a batch file to start PDI. That’s it, you’re done.”. Except that on a lot of desktops the JAVA home hasn’t been added to the environment path, so when you run the batch file it can’t find Java.exe and it exits. Just a minor gripe here 🙂

Conclusion: this book is certainly a good choice to get you up to speed with PDI. You can also use it as a reference work later on. Definitely recommended.


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Recent Posts

Free webinar – Tackling the Gaps and Islands Problem with T-SQL Window Functions

I'm hosting a free webinar at MSSQLTips.com at the 19th of December 2024, 6PM UTC.…

6 days ago

dataMinds Connect 2024 – Session Materials

The slides and scripts for my session "Tackling the Gaps & Islands Problem with T-SQL…

4 weeks ago

Connect to Power BI as a Guest User in another Tenant

Sometimes your Microsoft Entra ID account (formerly known as Azure Active Directory) is added as…

2 months ago

How to use a Script Activity in ADF as a Lookup

In Azure Data Factory (ADF, but also Synapse Pipelines and Fabric Pipelines), you have a…

4 months ago

Database Build Error – Incorrect syntax near DISTINCT

I wrote a piece of SQL that had some new T-SQL syntax in it: IS…

4 months ago

Speaking at dataMinds Connect 2024

I'm very excited to announce I've been selected as a speaker for dataMinds Connect 2024,…

5 months ago