When it comes to reporting and analytics, we all know what organizations want: easy access to timely, trusted data to help them make more informed business decisions. What the business users are looking for are pretty dashboards with lots of visualizations in which they can slice and dice and dynamically interact with the data. In order to make those dashboards a reality and sustain them over time with ongoing, updated information, it is important to have a data integration strategy. More specifically, a
data integration framework that will ingest the data, cleanse it, validate it, summarize it and model it to be presented by reporting tools such as Power BI or Tableau. This backed data integration effort is typically not understood nor seen by the end users, but it is still the most critical and difficult part when it comes to designing and delivering a modern data warehouse solution.
Of course, we need our data integration tools to be able to run on cloud platforms, have the ability to scale on demand and have high performance and high availability. We also want a robust solution that has flexibility to quickly and consistently manage the flow of any data source and prepare it for easy consumption and analysis.
But today, I wanted to take a deeper look at the inner workings of the data integration framework to see what is actually happening. Here are the key components that should be included as part of your data integration framework.
Staging Area - The first step in the data integration process is to create a staging area to land the source data. Typically, the source data is copied and moved over to the staging or landing area “as-is”. This is done to capture the source data as quickly as possible into the data warehouse environment and disconnect from the source to reduce stress by limiting system resources.
Data Validation - Once the data has been moved over from the source system, you will want to ensure that you data is accurate, and it matches the source system. Organizations place a lot of value on their source systems, and they are typically regarded as the “system of record”. If the data warehouse environment does not match your source system, users will not trust the data and the data warehouse will quickly lose adoption and value as it can't be trusted.
Data Cleansing - All organization have some level of data quality issues that they need to address on an ongoing basis. During the ETL / ELT process, there is an opportunity to check for any data issues, such as missing data or invalid values. For example, a text field that should be numeric. These type of data errors can be addressed during processing and all errors should be flagged and captured with corresponding email/text notifications sent to IT or support to bring awareness.
Data Transformation - In addition to handling data issues, your data warehouse can also add additional business logic to enhance or enrich the data. For example, you can perform a calculation on two or more values and store the results in a new field. Or perhaps based on the value of a certain field, you can assign it to a new category that is only used for reporting purposes. For example, if you wanted to categorize customers age, store location or vendor territory. This helps standardize business rules that
need to be applied across the organization and reduces the effort to apply these rules into each of the reports.
Data Orchestration - Finally, a data integration framework should have a flexible way to control and orchestrate all the data flows and processing. A set of control tables can store metadata to help facilitate the logic and flow of data. For example, you can store which data sources and tables need to be processed, which procedures need to be run, if you want a full or incremental data load, or when was the last time a job ran. These control tables create tremendous efficiencies by reducing code, consolidating logic, and making it easier for developers to manage the overall data integration process.
The output detail of all processed jobs can also be captured as part of the framework which can then easily be captured in a dashboard to track status and performance.
At Westberke Consulting, we have developed an integration framework that incorporates all these components that will speed up data warehouse development, reduce your TCO, and give you the confidence that your data warehouse is providing you with a trusted source of data you can act on.
Comments