Although the data-driven innovation brings huge opportunities and enormous benefits for individuals and businesses, a greater access and use of data creates new challenges in many domains. The main implications affect privacy, security, as well as democracy and participation. To ensure a fair balance between data protection and data-driven innovation, legislators are pushing the research and business actors to consider and address these challenges emerging from the development and use of big data and analytics. To draw the attention of the big data community, and discuss how big data technologies may help overcoming them and preventing loss of privacy E-SIDES in collaboration with Big Data Value Ecosystem project hosted the workshop called “Towards Privacy-Preserving Big Data” held at EBDVF on the 22 nd of November between 16:30-18:00 hours at the European Big Data Value Form in Versailles, France.
To boost the discussion, the following contributors took part in the session: Prof. Ernesto Damiani as keynote speaker, Director of the Information Security Research Center and SESAR Research Lab, Khalifa University and Università degli Studi di Milano; Dr Anna Zsófia from SODA project; Dr Sabrina Kirrane from SPECIAL project; Dr Mike Priddy from K-PLEX project; Gianluca Ripa from TransformingTransport (TT) project; Prof. Edwin Morley-Fletcher from My Health My Data project; and Philip Carnelley from IDC.
The debate was centered around 3 main questions:
1) Resolving the contradiction between big data innovation and privacy, how to choose your approach and how to communicate it transparently to your users? The discussion focused on the primary role of accountability and transparency in data use, as well as on the need to get consent from users. It was stressed that algorithms can ensure the anonymization and avoid the re-identification of data, but at a higher cost. Legislation can provide a considerable support to solve these issues, but in many cases there are a lot of remaining gaps and space for interpretation. From an industry perspective, companies have to build trust around their use of data and big data technologies to be more competitive.
2) Allowing privacy-preserving analytics in big data contexts, how much personal data ends up in aggregated results or prediction models and how GDPR may apply? One of the emerging issues, which goes beyond privacy itself, was the general lack of understanding of companies about how to comply with GDPR, and it was reported that more than 10% of the companies still think the GDPR will not apply to them. A key role is played also by computer and data scientists who should comprehend and consider more the ethical implications. A comment raised by the audience focused on how the combination of different technology could increase the privacy friendliness of future technology and the extent to which future legislation will require this as it may be used to enhance transparency and may result in more privacy.
3) Meeting context- specific needs with the right big data architecture, how regulation or ethics may impact it? The examples reported ranged from different domains. In the healthcare, general difficulties are emerging, like the compliance with GDPR, and the informed consent; in the transportation sector, the need to prevent privacy abuses can limit the amount of data that could be extracted from traffic cameras. One of the comment from the audience concerned the follow-on effects and societal effects of privacy the fact that privacy is not only an issue for individuals; for instance, when DNA data is used by law enforcement it also contains data regarding relatives, or when someone posts group pictures on social media with other people that may not have consented to this.