Description

Identifying heart failure patients at high risk using Multi-Party Computation.

Problem Context

Machine learning algorithms are widely used to improve health care, for example to identify risk factors for a disease. Results can be used for the development of new treatments. For training these algorithms, a lot of data are needed, often divided over different data sources. In practice, however, combining these data sources is both legally and technically challenging. When scientific researchers want to use personal retractable medical data for machine learning algorithms, this needs to be in line with the General Data Protection Regulation (GDPR). Local data can often be used for scientific research purposes, however it becomes challenging when data from different sources needs to be combined. This is because, first of all, the GDPR focuses on using as little data as possible, while machine learning thrives on large datasets. Secondly, consent from the patient is often needed, which is time-consuming and causes practical problems, for example because the hospital is no longer in contact with the patients. Clearly, an alternative method is required to combine different data sources.

Solution

This is where Secure Multi-Party Computation (MPC) solutions come in. Although we do not use real patients’ data, the set-up of our MPC solution is inspired by the following real-life situation. In Rotterdam, there is a group of patients that both is insured by insurance company Zilveren Kruis and took part in a program by hospital Erasmus MC. On one side, Erasmus MC has data on the lifestyle of these patients, for example their exercising behaviour. On the other side, Zilveren Kruis has data on different attributes such as hospitalization days and health care usage outside the hospital. These datasets, once combined, could be used to train a prediction model that identifies high risk heart failure patients. However, concerns about privacy and consent (to name a few) mean that these parties cannot simply share their data to allow for a straightforward analysis. That is why, in 2018, TNO, together with Erasmus MC and Zilveren Kruis, started a pilot within the H2020 Project BigMedilytics to develop a secure algorithm to predict the number of hospitalization days for heart failure patients (with simulated data).

Results

Altogether, the first results of the pilot look very promising. With synthetic data we tested that we can run a regression on 10.000 patients and 10 features in half an hour. We thus found that our MPC solution has potential to obviate the current complicated process of data coupling. It can provide a solution to the GDPR discussion on data minimization when combining data. The mathematical guarantees for the patients’ privacy ensures accurate prediction models without sacrificing privacy. Therefore, the application of MPC will result in more data, and hence more trustworthy results. In the future, it is essential to investigate what hurdles need to be overcome within an organisation to start using MPC in this way, such as legal and compliance aspects. We are confident that such steps will result in an MPC pilot with real medical data in the near future.

Contact

  • Alex Sangers, Project Manager, TNO, e-mail: alex.sangers@tno.nl