Okay, the title of this blog post could also have been “SUMX returns unexpected results with duplicates”. The results only seem incorrect because an incorrect assumption might have been made. Let’s dive into the issue with an example.
Suppose we have employees entering timesheet data. An employee can work on multiple projects in the same day, and each employee has a cost assigned to him. Here’s some sample data:
Employee A has worked for two different projects on two different days. The first day, he works 2 hours for project X and 6 hours for project A. The day after, it’s the same. This time however, the employee made a mistake entering the timesheet: he entered 1 hour instead of 2 hours for project X. Instead of correcting the line, he simply adds another one with also 1 hour. This results in “duplicate” rows in the table, although they are very legit. The CostPerDay column indicates the cost the company has for having that employee. You can compare it with a Unit Price column, for example.
All right, now we want a simple report showing us the average cost per project for the employee. We add the data into Power BI Desktop and create the following measure for the number of hours worked (because explicit measures are better than implicit measures!):
Worked Hours = SUM(Timesheet[HoursWorked])
We also create a measure where we multiply the number of hours worked with the cost per day divided by 8 (the standard number of work hours in a single work day). This gives us the following formula:
Avg Cost = SUMX(Timesheet,Timesheet[CostPerDay] / 8 * [Worked Hours])
When we add this to a table, we get this:
Hold on, the result is different for the 29th than the 28th? This is what the title means with incorrect (or unexpected) results. Even though the data looks exactly the same in the table, the result are not.
It is not a bug in the DAX formula language. The problem resides with the duplicates in the table. When calculating the SUMX, DAX has a context transition, where the row context is switched to a filter context. However, the filter context includes the duplicates, therefore the result is higher for the 29th. It’s a bit hard to follow, but luckily Marco Russo has an entire blog post on this, where he also warns for duplicates. You can only assume the context transition works if there is a primary key on the table, which in this case there is not. I definitely recommend you read that article.
There are three ways to solve this issue:
= [CostPerDay] / 8 * [HoursWorked]
When we use this measure in the table, we can see it returns the expected results:
The last option is my preferred one, as it removes the need for a SUMX measure (which is an iterator and might have bad performance) and because it follows the best practice of doing calculations as early as possible.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
Hi Koen,
I created a Power query calendar and have issue when I have data listed on the same day.
How do make sure new data is added under with the existing data.
Hi Patrick,
I'm not sure what you mean. Can you give an example?