Managing Data Privacy Risk in Advanced Analytics

June 11, 2024

147 2 minutes read

Managing Data Privacy Risk in Advanced Analytics — 2024SUM Vial 2400x1260 1 1200x630.jpg

“How can we protect the privacy of our customers’ personal data while leveraging that data via AI and analytics?” This question reflects a growing internal dilemma as companies pursue advanced analytics and artificial intelligence.

The troves of data that customers’ ever-more-digitalized lives produce can be a rich source of insight for organizations using advanced analytics tools. At the same time, this data is a deep source of concern to IT staffs committed to meeting both regulatory agencies’ and consumers’ expectations around data privacy. Both are important objectives — but meeting them simultaneously requires confronting an inherent conflict. Increasing data privacy in the context of analytics and AI involves using techniques that can reduce the utility of the data, depending on the task and the privacy preservation technique chosen.

The issue is one that an increasing number of organizations will face as the fields of analytics and AI continue to quickly evolve and lead to the widespread availability of an array of tools and techniques (including turnkey and cloud-based services) that enable organizations to put data to work more easily than ever. Meanwhile, customers have increasing expectations that companies will take all necessary precautions to protect the privacy of their personal data, especially in light of reports of large-scale data breaches covered by mainstream media outlets. Those expectations are backed by regulations on personal data and AI across the globe that make it critical for companies to keep personal data protection practices in compliance.

Fundamentally, data privacy is about assessing the probability that one or more attributes, or pieces of information, about an individual whose data has been anonymized and included with others in a data set can be used to re-identify that specific individual. Some of these attributes are obvious: Direct identifiers that enable almost immediate identification include name and Social Security number. Quasi-identifiers do not generally enable the identification of a single individual on their own, but their uniqueness or their combination with other attributes may do so. For example, the combination of a person’s age and their address may enable their re-identification. Or consider a data set held by a bank’s fraud alert team on customers’ card transactions. That data set contains both direct identifiers (such as the customer’s name) and quasi-identifiers (such as credit card transaction information).

About the Authors

Gregory Vial is an associate professor in the Department of Information Technologies at HEC Montréal. Julien Crowe is senior director of artificial intelligence at the National Bank of Canada. Patrick Mesana is a doctoral candidate in the Department of Decision Sciences at HEC Montréal.

References

1. C. Dwork, A. Smith, T. Steinke, et al., “Exposed! A Survey of Attacks on Private Data,” Annual Review of Statistics and Its Application 4 (March 2017): 61-84.

2. T.E. Raghunathan, “Synthetic Data,” Annual Review of Statistics and Its Application 8 (March 2021): 129-140; and S.L. Garfinkel and C.M. Bowen, “Preserving Privacy While Sharing Data,” MIT Sloan Management Review 63, no. 4 (summer 2022): 7-10.

3. T.H. Davenport and R. Bean, “Action and Inaction on Data, Analytics, and AI,” MIT Sloan Management Review, Jan. 19, 2023, https://sloanreview.mit.edu.

4. Raghunathan, “Synthetic Data,” 129-140; and Garfinkel and Bowen, “Preserving Privacy While Sharing Data,” 7-10.

Source

June 11, 2024

147 2 minutes read