Guest Post by Eugene Lebedev
![Automate Data Extraction in PowerBI and Excel](https://static.wixstatic.com/media/98bd55_fcebfd6baa454a7db4f74ded1e5a878b~mv2.jpg/v1/fill/w_980,h_578,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/98bd55_fcebfd6baa454a7db4f74ded1e5a878b~mv2.jpg)
Refreshing data manually in your analytics reports in Power BI or Excel is one of the most boring and time-consuming tasks you can do as a data analyst. This is the problem I faced in my first job when at least 15 hours of my work week was dedicated to refreshing Excel reports with new data. Over time I learned how to automate this process and have automated it for 600+ Power BI consulting firm clients.
There are 5 main ways that the data extraction into your dashboards can be automated. In this article, I will discuss all of them, give real-world examples that I worked on, and explain the advantages and disadvantages of every one of them. At the end of it, you will be able to decide whether any of my methods would work for you.
This article aims to highlight the possible automation methods so that you know what is possible. I will not give detailed technical training on them here because each of those methods deserves its own article. However, I will link more articles here to give you additional instructions.
The ways to automate data extraction to Power BI reports (in the order of technical complexity) are the following:
Using data extraction software
Email scraping using Power Automate
Building a custom Power BI connector
Creating a data warehouse
Web Scraping
Automate data extraction using data connector software
There are quite a few companies that create data extraction software. This software is sometimes called Power BI/Excel data connectors. Having worked on 1000+ projects at Vidi, I have tried a bit more than 10 of them including SuperMetrics, Windsor, funnel, cData and even built our connectors.
Several factors influence the choice of data connector software
Technical efficiency – the connectors you choose should be able to load your data without crashing or long processing times. There are some data sources like Shopify for which technical efficiency is difficult to achieve because of large data volumes and complex data models. You would likely have to pick a more expensive data connector for those. However, pretty much any data connector would work for sources like Facebook Ads or Google Ads.
Simplicity to set up – it is important that the setup process is simple and requires no technical skills. This would help to ensure user adoption within your company.
Cost-effectiveness – The cost of the data connector software ranges from $20-200 per month per connector. However, beyond licensing fees, you should consider whether the choice of a connector could save you some working hours through improved technical efficiency and convenient data format. After all the data analyst's working hours are often much more expensive than the licensing fees.
The advantages of using data connector software are:
Reduced complexity since there is no need to write code for data extraction
Reduced time investment into dashboard-building projects
Disadvantages of using the data connectors software are:
A limited number of data sources covered
Need to pay a subscription
Email Scraping with Power Automate
Some software allows users to schedule emails with Excel data exports. If this is the case, every day/week you will receive an email with an Excel file containing your latest data. These emails would usually arrive from the same email address and have the same subject line. This consistency allows to building of some conditional logic to automate data extraction.
I recommend using Power Automate for this kind of automation. Power Automate is a Microsoft technology that allows you to create integrations between different software in a no-code/ low-code way. Power Automate has a Microsoft Outlook connector allowing you to connect to your mailbox.
With Power Automate you can create triggers to launch your automation (for example you can launch an automation when an email from a certain address is received). Finally, you can create actions with Power Automate (move a file to Sharepoint). As a result, we have the following logic: when an email is received from a certain address, take the attached file from this email and save it to Sharepoint. We can then connect Power BI to a Sharepoint folder.
We recently worked with a mining company that used software called CAS Web to gather their data. This software allowed to scheduling of automatic emails with CSV export.
Before we worked together, they used to manually copy data from those files and save it to an Excel document. We changed the process now so that when they receive an email with a new file, it is automatically moved to Sharepoint. As a result, this automation saves them 3 hours per week and improves the consistency of data in their Power BI reports.
Advantages:
Quick and easy to set up
Disadvantages:
Not many software products allow scheduling emails with CSV data exports
Building a Custom Power BI Connector
As you probably know, you can usually get data into your Excel or Power BI through the “get data” menu inside of Power Query. This menu contains the list of data sources that you can extract your data from. The way it works is Microsoft writes its code to extract data from a certain data source. They write separate blocks of code for every data source since every data source is unique. You can also write this code yourself and by doing this you would create what is called “a custom Power BI connector”.
A Custom Power BI connector is a .mez file that is stored on your computer. When you save a .mez file to this folder, Power BI will recognize this. The next time that you open Power BI, you will be able to see a new data source in your get data menu.
The mez files are essentially zip files. They are folders that contain multiple files. The .M file is the one that contains the code extracting data from your data source. Now, let me explain the logic and the process of creating those connectors.
Firstly, you need to evaluate your data source and check whether it has an API. A simple Google search like “My_data_source API documentation” would reveal whether an API is available. If you see any meaningful search results, this means that you can build a custom Power BI connector on top of this API.
The second step is to create formulas in a coding language called M that sends API requests as per documentation. Here is a more detailed post about step-by-step instructions for how to create a custom Power BI connector. Once your M code is ready, you would place it inside of a .mez archive and save this archive in a folder as shown in the screenshot above.
Advantages:
No need to pay a subscription
You can have full control over what data you extract automatically from your API
Disadvantages:
You need advanced technical skills to develop a custom connector from scratch
You will need to maintain your custom Power BI connector once it is developed
Creating a Data Warehouse
Creating a data warehouse can greatly help to automate data extraction and transformation. The process of creating a data warehouse is quite similar to creating a custom data connector, you would simply use different technologies for performing the same technical steps.
The following technologies are a very popular choice:
Python or Azure Data Factory for extracting the data from APIs
Azure Cloud Functions or Airflow for scheduling your data extraction procedures to run
SQL Server database for hosting your data
SQL formulas for transforming the data to the format that you need
This video visually shows a typical architecture for a data warehouse.
Creating a data warehouse has the same advantages as creating custom connectors and a few additional ones:
Handling large amounts of data – extracting large data volumes directly into Excel or Power BI is often a bad idea. Especially if you are extracting data from multiple data sources. This often leads to long data loading times and eventually “timeout” errors. The chance that you face these issues with a data warehouse is much lower.
Data availability – you can extract the data from your data warehouse into Excel, Power BI, Tableau, or any other data consumption tool of your choice. This way you don’t have to code separate data extraction processes for every data consumption tool that your team uses.
The disadvantages of this approach are:
Requires specialized knowledge and team segmentation – you would likely need to hire a data engineer on a full-time or consulting basis.
Hosting your data warehouse in the cloud is not free – for example, if you host your SQL Server database on Azure, the cost is $20+ per month.
Extracting Data through Web Scraping
If you want to extract the data from a source that does not have an API, web scraping is your next best option. Essentially web scraping involves writing a bot that performs the same actions as you would to extract the data manually. The bot would simply perform them much faster saving your time and ensuring consistency in the data extraction.
The technical steps for data extraction through web scraping are:
1. Analyze the website structure - plan for categories and subcategories of website pages
2. Automatically go through each page of the website 1 by 1
3. Extract data from HTML classes and tag data
4. Schedule your scripts to run on a regular basis to refresh your dataset (could be an Excel or SQL server database)
If you want to scrape data without writing any code and relying on a visual interface, I would recommend web scraping using Power Automate. If you have some Python coding skills you would typically use the selenium package to iterate through website pages and then the beautifulsoup package to extract the data from HTML.
The advantages of this approach are:
Automated data extraction even from data sources that have no API
Consistency means that all of your data is collected using the same approach which eliminates manual error
The disadvantages of this approach are:
Scraping puts a lot of stress on the website servers which means that many websites have anti-scraping defense measures. Working around those measures takes a lot of skill.
Once the website interface changes the scraping automation will break down and some adjustments will be needed.
Choosing the Reporting Automation Method
Now that you know the 5 ways that data extraction into your reports can be automated, you will need to choose one or more of them. Your choice depends on several factors:
Technical skills – if you don’t have the technical know-how in your team, it is often easier to choose a data connector software. It is much easier to pay a software company than paying a maintenance retainer to a consultant. If you have technical knowledge, you can consider scraping and writing custom connectors or data warehouses.
Business Goals – if you want to use the extracted data in many different parts of the business, creating a data warehouse is usually the best way to go.
Level of customization – if you want to extract the data in a very custom format you will need to create a connector or a data warehouse or scrape your data. A specialized consultant can help you with these custom data extraction ways or you can train your internal resources.
The last thing I will say is that you don’t have to choose only one of these methods. You can always use a combination of these methods and use different ones for different data sources. Once the data flow into your reports is automated you will achieve more real-time reporting, improve the reliability of your reports, and free up valuable working hours that could be invested elsewhere.
I hope you find this blog useful! Happy automation!