Analytic layers will usually store large amounts of data which will be aggregated in different layers or applications, transnational level will not always be available to all and is more difficult to track.
Users can notice significant gaps, but not in all cases and not in real time.
Multi-Dimensional databases such as Microsoft OLAP and SAP BW cubes, provide users a pre-defined aggregation of KPIs and dimensions, transnational data is lost. Adding a lower level of granularity to OLAP systems causes data explosion and has a significant cost in performance.
When users view data in dashboards and reports which is based on OLAP systems, they cannot always verify the results easily. As a result, it is more difficult to gain the user’s trust the quality of the data.
We’ll try to define action principles in order to gather an approach of Data Validation at your organization. Data validation is relevant as part of a successful BI project and as a production on-going process.
The Target: Gain users trust and assure data is reliable.
1. Validate your data across different areas and at as many granularity levels
Random samples are not sufficient, all it takes is a few missing records to effect reports and ruin trust.
- Choose which source application areas you would like to validate
- Choose the periods you would like to validate
- Choose levels of granularity you would like to compare to your raw data
- Decide on the validation frequency
2. Start validation at a high granularity level
There is no need to validate low level granularity unless the high level validation indicates discrepancies. Only then, it is required to drill in and perform more detailed validations.
One of the data validation main purposes is to narrow down the area for investigation when a problem was found.
3. Validate business KPIs known to users
KPIs used in reports are calculated according to fact measures. Technically we can assume that it is enough to validate the basic measures. However, we can miss calculation logic implemented in the OLAP layer. The primary goal is to gain users trust, so validating the exact key factors familiar to users can accomplish that.
4. Automate your validation processes
Make your validation a continuous process and integrate it in your BI system. Integrate the validation in the loading processes if possible, or schedule it after the loading.
5. Know what is the sources of your data
In order to perform proper validation, you must be familiar with the data models and the source of attributes and KPIs. It is important to know if the field is manually fed in the source system, calculated by logic in the DWH, etc…This means you need to know the business process in your data creation. This will only assure a successful data quality process and ability to motivate others to correct the data.
6. Multiple multidimensional models may require a different approach
Take into consideration that if you have multiple cubes in your organization. You may need to validate them in a different manner, based on the needs and business scenarios relevant to the operational areas it covers.