Secure learning

Description

Secure learning is an umbrella term for various methods that all focus on the creation of secure algorithm which can extract information from data while preserving its confidentiality.

Secure learning means an AI can handle (distributed) sensitive data by identifying and assessing potential information leaks and proposing secure-by-design alternatives. There exist a variety of methods for achieving secure learning.

Synthetic data generation

Synthetic data generation (SDG) has emerged as a solution to learn from sensitive information and accommodate analysis by third parties. SDG methods use AI techniques to analyse personal data and produce syntethic data on the basis thereof. The synthetic data resembles the original data without containing real sensitive, personal information. As a result, they form an attractive substitute to be used in analyses and model development.

Federated learning

Standard machine learning pradigms like supervised and unsupervised learning assume the data to be available as a tabular dataset. In practical applications the data can be spread in various distributed databases. For computational or confidentiallity reasons it can undesirable to copy the federated data to a central server and database. Computational reasons relate to the cost of keeping the central database up-to-date when the communication is expensive and the federated data volatile. The most important reason is, however, the confidentiallity of the data, which can be because of company confidentiallity or privacy/personal data protection (GDPR).

With such limitations, the goal is to learn a global machine learning model without moving the data to a central database. Additionally, there may be explicit confidentiallity requirements with respect to information leakage between the database parties involved. Typically, there are the following steps: 1) a client initiates the learning process with an initial global model, then 2) all database parties update that model with their own data (a few iterations), 3) the client updates the global model with the local updates, 4) the whole process repeats until convergence or a finite number of iterations.

For various algorithms federated solutions have been proposed with similar performance to the central algorithm.

A related concept is distributed learning, where usually the goal is to gain efficiency by employing several distributed computers, or a cluster. In such a case the central database is distributed to enable the distributed processing.

Multi-party-computation

Secure Multi-Party Computation (MPC) is a set of cryptographic techniques that allow for generating functionality for federated databases without the need to copy the data to a central database. Conceptually the functionality works as if the data were in a central database, while the cryptographic measures ensure data confidentiality. In the context of AI research, MPC is often employed to enable machine learning on federated databases with high levels of confidentiality.

What does TNO offer on secure learning?

For studying MPC, TNO has set up the TNO MPC Lab. The TNO MPC Lab is a cross-project initiative initiated to improve the overall quality, generality, and reusability in the development of secure Multi-Party Computation solutions developed in the numerous (past, ongoing, and future) TNO projects that involve MPC. It consists of generic software components, procedures, and functionalities developed and maintained on a regular basis to facilitate and aid in the development of MPC solutions. The lab strives to boost the development of new protocols and solutions, and decrease time-to-market.

Resources

L. Li, Y. Fan and K. -Y. Lin, "A Survey on federated learning," 2020 IEEE 16th International Conference on Control & Automation (ICCA), Singapore, 2020, pp. 791-796.

Secure Multiparty Computation, By Yehuda Lindell, Communications of the ACM, January 2021, Vol. 64 No. 1, Pages 86-96.

TNO MPC Lab

Open source publications on MPC

More on MPC within TNO