DataOps Principle #11 – Make it reproducible
I am panicked… it has been a long week at work and a small group of Power BI reports are showing the dreaded “Fix this” error. First piece of advice, the “Fix this” button is not an “Easy button” and pressing it will blow away the offending issue and obliterate all the formatting you have painstakingly put in place. Click on the “See details” first and see if you can fix the measure or calculated column (often the issue) and only press the “Fix this” button if you’re ready to take on the rework.
The dreaded Fix this button.
So, I ask my project team what the “See details” is indicating. They say that a core measure in the Power BI dataset has gone missing and several measures in our reports rely on it, ergo the error message. What is worse is that we don’t know when it went missing! So, each member goes hunting where most Power BI reports reside… the Downloads folder. We scour the numerous copies of Power BI files suffixed by a parenthesis and number (ex. File (2).pbix) and try to find the lost measure.
Does your Download folder keep your versions of Power BI?
After 30 minutes we find multiple versions… now which one is right? After another 30 minutes of trial-and-error, we find the right version and fix the model. Crisis averted, but to me this was not acceptable and quite embarrassing.
DataOps Principle #11 - Make it reproducible: Reproducible results are required and therefore we version everything: data, low-level hardware and software configurations, and the code and configuration specific to each tool in the toolchain.
I share this story because I have heard similar stories from other Power BI teams, so if you have had a similar experience, you’re not alone. For me, the Power BI product poses a serious challenge to the DataOps principle “Make it reproducible” because versioning our Power BI faces the following challenges:
The Challenges
1) The Power BI format – A Power BI file is a zip file of many files, so unlike other programming languages (ex. C#, Java) where you can use version control to merge changes within a file, Power BI doesn’t have that capability. And if you use third party tools to edit these files (like the lovely Tabular Editor) you operate under your own risk and should you corrupt the pbix file, Microsoft Support will not help you. This is a common pattern with low-code solutions, so Power BI is not alone here, but it is vexing nonetheless.
A pbix file under the hood
2) The Coupling of Code, Data, and Visualizations – The custom code we develop in Power BI is often DAX (Calculated Columns, Calculated Tables, and Measures) or M language (Power Query) and it is tightly coupled with the underlying data AND the visualizations of that data. It is convenient as a low-code solution, but DataOps emphasizes decoupling each component and versioning each.
Best Practices
First and foremost, my advice for best practices is not a panacea. It will however mitigate the risk of version issues and make it much easier to manage a team working with Power BI.
1) Get Version Control – This shouldn’t shock you, but if you don’t use Git to store your Power BI files you really need to use it now. Tools like Azure DevOps (free for up to 5 users), Bitbucket, and GitHub mean you shouldn’t have an excuse to version your Power BI code. If you’re unfamiliar with Git, GitHub Desktop makes it super simple to commit and push your Power BI file changes to a repo. It also forces you and your team to provide a description of the changes you make and allows you to recover prior versions should the latest version of your Power BI file have an issue.
2) Separate Reports From the Model – I’ve heard many Microsoft MVPs and employees talk about this, but I’m repeating it. Following this advice is quite helpful because:
Many reports can look at one version of the data which eliminates the overhead of fixing the same measure several times; you just make the change once.
You can stub out your model by manually entering in data in Power Query to create tables, hardcoding your DAX measures, and then publishing a model to DEV. Your other team members won’t have to wait for you to finish working on the model and they can be productive building the visualizations and delivering products faster. In fact, you can take it a step further and have your team members work different parts of the report in parallel on separate Power BI files and then combine them through Power BI’s copy and paste feature. I’ve done this on numerous projects now and it has really sped up delivery time.
By separating the reports from the model you can work separately on Power BI reports and combine work using copy and paste.
3) Institute Workspace Governance – Do not intermingle your development Power BI files with the Power BI files that your clients/customers use. You should at a minimum have two workspaces, one for development (DEV) and one for production (PROD). For larger projects, you should have a workspace for clients/customers to test (TEST) before moving to production. If you are unfamiliar with the concept please read this wiki article.
4) Workspace Permissions – For larger teams limit those who have Member or Admin permissions to your TEST and PROD environments. Those with Member or Admin permissions should only publish Power BI files if it is in source control. If you have Premium, try Deployment Pipelines. And if you wait a few more weeks, I will show you how to automate the testing and deployment of Power BI files with Azure DevOps and Power Shell (yep, shameless teaser).