Description

Enabling Privacy-Preserving Analyses on Healthcare Data

Description

Access to more data within the healthcare sector can improve survival rates. Linking diverse datasets together can let an algorithm learn which factors affect the impact of treatment and the survival chances for cancer patients. In practice, this data is often held by different parties in a variety of sources. If this data is to be linked together and analysed in a way that is secure and respects privacy, it is essential for the algorithms to be trained without the need for mutual sharing of the underlying data from the various sources. In particular we focus on the vertically partitioned data setting: different organizations have different variables over the same group of patients. Within this project, new proof-of-concept solutions are being developed, implemented and tested using synthetic data. The SELECTED project works closely together with the project LANCELOT, a TKI HTSM project together with IKNL and Janssen (2021-2022).

For this, innovative privacy-enhancing technologies like Multi-Party Computation (MPC) and Federated Learning (FL) are applied. MPC (https://www.tno.nl/mpc) is a collection of cryptographic techniques allowing multiple parties to carry out joint analyses of their data without having to share that data with each other or with a third party. This allows both simple analyses and AI algorithms to be applied without infringing privacy rules. FL (https://www.tno.nl/en/focus-areas/information-communication-technology/roadmaps/data-sharing/federated-learning/) resolves the privacy problem by taking the analyses to the data instead of bringing the data to the analyses. The analyses are broken down into small partial calculations that are carried out locally by the various parties. After a local computations have been executed, only its results or intermediate output are shared with one or more external parties. As a result, the sensitive data items are not shared with anyone and remain with the source.

Several MPC and FL solutions on vertically partitioned data have been developed in the projects in 2021. The MPC solutions (developed in LANCELOT) are slower, but offer a higher privacy guarantee and require less information sharing (e.g. information about overlapping patients can remain hidden). The FL solution (developed in SELECTED) is faster and applicable to a larger class of Generalized Linear Models, but require more information sharing.

In general, it is very challenging to mathematically prove that no sensitive information is revealed from FL implementations, in contrast to MPC where there are certain cryptographic guarantees. Already attacks have been developed on specific FL solutions, showing that in certain cases most sensitive information can be revealed. In the more challenging vertically partitioned setting even more information needs to be shared, not in one iteration but in many iterations. Currently the field of developing attacks on FL models is in full development – while it is difficult to quantify how much sensitive information such attacks could reveal, we expect to increase the privacy guarantees if we can ensure less information needs to be shared by employing MPC in combination with FL. In SELECTED 2022, more research is being done on the combination of MPC with the FL solution, as well as dissemination of results from both LANCELOT and SELECTED.

More information

This project is part of TNO’s Appl.AI programme and it is partly financed from the kickstart fund that the NL AIC received from the government for research and development of AI applications. SELECTED stands for ‘Secure Learning for oncology on vertically partitioned data’. It is being developed by TNO. This is done in collaboration with the parallel project LANCELOT, partly funded by Holland High Tech, in collaboration with the Netherlands Comprehensive Cancer Organisation (IKNL) and Janssen.

Contact

  • Daniël Worm, Sr consultant, TNO, e-mail: daniel.worm@tno.nl