Insights

Data Quality Management Series: The Origin of Data Quality Issues

Undeniably, issues surrounding data quality are the ‘Dracula’ of Financial Services: they are costly and drain a significant amount of resources.

Why not just give up? Because increasing the quality of your data can give you a decisive edge for the future. It can accelerate the delivery of artificial intelligence projects by a factor of five or as Goldman Sachs has evidenced to create millions in additional revenue.

A stack of wooden coffins

That therefore begs the question: how can you minimise the impact of Data Quality Issues (DQI)? We will explore this through a three-part Data Quality Management Series, looking first at how DQI’s emerge and their consequences, then explaining some existing methods to address DQI’s, and finally a showcase of three innovative solutions based on Sia Partners’ experience.

Part 1: Do data quality issues live in coffins?

In our first article of our three-part data quality series, we look at the origin of data quality issues.

Data Quality Issues come in various shape and sizes. From our experience of data management projects, we have identified four main causes of DQI’s:

  1. Input error from users: It can be a missing/wrong/dummy or deliberately incorrect value. Mistakes come in a multitude of shapes (typo, abbreviations, wrong format, inaccurate information etc.), which makes it complex to verify that your data is correct.

E.g. a bank employee inputs an incorrect or dummy phone number to by-pass a control instead of the correct phone number of the client.

  1. Error in the transmission of data between systems: When data is sent from one system to another, the exchange protocol and configuration of each system can differ. This may lead to a character being misread or an entire row missing. It has a significant impact on legacy data which is migrated between systems.

 E.g. date or specific symbols are often misread and replaced when copied: “é” to “%20£%”.

  1. Error in the data calculation: Some data may be calculated and transformed internally using other values as input. The calculation can be false due to the input, technical problems, update or delays. This will hide the root cause of the issue making it difficult and costly to resolve.

E.g. a third-party rating is calculated using a wrong industry code leading to inaccurate risk exposure.

  1. Data obsolescence: Data changes and evolves, which can make a database obsolete over time. Banks need to rely on a client coming forward with an update or their employees’ diligence to track changes.

E.g. a customer’s home address or phone number may change multiple times.

So what are the consequences of bad data quality management?

As we saw, DQI’s are like vampires. They live hidden in a coffin, at every step of a process, until you need them and they come bite you unexpectedly. The consequences of poor data quality management are numerous, and can result in non-compliant systems, inaccurate reporting, incorrect insights regarding company risk, revenue or operations leading to incorrect decision and missed opportunities, fines, resources wasted on issue resolution, long delays to access information, etc.

Most importantly, it can reduce trust in the data of a given firm. Reduced trust is the start of vicious circle of poor data management, as illustrated in the diagram below. It makes managers doubt the information they receive, which can have the trickle-down effect of creating additional work for everyone.We highlighted the on-going consequence of not having a Data Quality Framework, creating a perpetual negative cycle and disregard for Data Quality.

The-negative-cycle-of-poor-data-quality-management

The negative cycle of poor data quality management

  1. Lack of attention and tools around Data Quality create human input error, the root of Data Quality Issues (DQI)
  2. Data Quality Issues are not identified at the beginning of the process which will increase the cost to correct them
  3. DQI are propagated in other systems and create calculation errors
  4. No controls are in place to ensure the data is correct before being used by other processes
  5. Data Analysis is conducted on data of variable quality. Either, the Data Analyst identifies and cleans the data leading to short-term solution and reduced productivity, either the DQI remain unseen creating bias in the results
  6. The result of the data analysis is shared with Management, leading to wrong decisions taken. Leading to a lack of trust in the use of data

Conclusion

Addressing common Data Quality Issues is a complex and critical process. DQI’s may emerge randomly, at various steps of a process, and for a number of different reasons. Avoiding dealing with these problems will trap your firm in a vicious circle which may well hold back the efficiency of future projects in addition to hampering your capacity to innovate.

In our next article we will explore the different methods to address DQI’s and try to identify which one is the most efficient.

Sources:

  1. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says, Forbes, Gil Press
  2. How Goldman Sachs Made More Than $1 Billion With Your Credit Score, The Wall Street Journal, Liz Hoffman and Anna Maria Andriotis