What is Data Profiling?
Data profiling is the analysis of data sources to clarify structure, content and data quality along the eight data quality dimensions:
Accuracy – related to exactness of the data
Completeness – related to availability of full attributes (i.e. contact record without email address)
Coverage – related to availability (i.e. unavailable language translation)
Conformity – related to alignment to required standards (i.e. telephone number format for autodial)
Consistency – related to data consistency (i.e. Zip code does not match city)
Duplication – related to redundancy (i.e. many records of same contact)
Integrity – related to data relationship (i.e. contact record does not match client details)
Actuality – related to up-to-date information (i.e. out of date customer details)
The benefit is to better understand the data and derive clear data cleansing requirements. It helps clarifying at an early project stage if the right data is available at the right detail level. It reduces project risks by quickly identifying and addressing potential data issues.
Although data profiling can be performed using spreadsheet formulas or SQL scripts it is much faster and more reliable to trust professional data profiling tools as well as the technical and business expertise of the data profiler.
The picture on the left shows a sample of SAP Material Master Data. This is the start view of Experian´s X88 profiling software immediately after loading the data source.
The “completeness” column shows to what percentage of the total number of data records the corresponding field is filled (i.e. column F” is only filled at a 93,68% level).
The “uniqueness” column shows to what percentage of the total number of data records the corresponding field is unique. (i.e. column “B” is only unique at a 26,73% level, means out of a total of 407.315 records, only 108.862 are unique).
Holistic Analytics utilizes several professional tools, such as Talend Open Studio, X88 Profiler from Experian, Altova Mapforce fit for purpose.
Our vast business experience allows us to also understand your data´s business impact.
So we provide a clear picture of your data quality and the amount of work required to achieve the desired quality level. Data profiling is a needed pre-requisite for data cleansing or data enrichment as well as data migration.