Privacy regulations such as GDPR, CCPA, and LGPD are requiring organizations to acquire consent in order to use their customers’ data for any purpose beyond the narrow one for which it was originally collected. Unless that data has been anonymized.
How do organizations know if their data has been properly anonymized, and how do they prove it?
These two questions present a huge burden for enterprises, and answering them properly means implementing significant changes in the way they have been doing business. No longer can they process data internally, or release it for third-party use, without explicit consent. This is a huge and potentially paralysing change.
The first step that organizations need to take is to analyze their data to assess the risk of re-identification. They should know, beyond all doubt, the probability that their data could lead to the exposure of personally identifiable information. Once they have this knowledge, they can take appropriate actions to reduce the risk. The second step is to ensure that data that is de-identified retains analytical value, so that organizations can generate the insights they rely on for data science and data analytics.
But for many organizations, this process could take a long time, and cause a loss of significant revenue and competitive advantage. Having the ability to automatically assess the risk of re-identification, apply privacy actions, and retain analytical value, will allow organizations to continue to grow and innovate – while remaining compliant.
How AI-driven attribute tagging enables powerful risk assessment
In order to carry out proper risk assessment, you need your data to be correctly tagged. The attributes that must be tagged are direct identifiers and indirect or quasi-identifiers, both sensitive and insensitive. But tagging of data is a slow and time-consuming process. Automatic tagging greatly reduces costs, increases compliance, and allows organizations to stay ahead.
Artificial intelligence can really help here. A neural net, for example, can be trained to recognize direct and indirect identifiers. Once the model is ready, it can be used to automatically tag your data. Better still, its understanding can evolve over time as your data changes.
Once the data is properly tagged, a risk assessment can occur that takes into account these attributes. That risk assessment can then provides a metric that an organization can utilize to decide on the appropriate privacy actions.
These privacy actions will reduce the risk of re-identification, but will also cause information loss. Therefore, these actions must consider the use of the data so that the right attributes retain the proper fidelity, while still reducing risk. The organization at this point can automate this process by recording the steps taken and then applying those same steps automatically for each additional dataset. Additionally, the actions can be different for different use cases and still enable an automatic process.
With these automated systems, an enterprise can implement “Privacy by Design.” Privacy regulations want to see this framework in business processes, in order to enforce compliance. Adopting this approach will ensure that your organization is ready for the future.
Article by Cryptonumerics, exhibitors on stand number 445 at Big Data LDN 2019