Detecting financial crime using synthetic data.

Problem Context

A synthetic graph can be used to train algorithms detecting financial crime, such as money laundering. Since transaction data is sensitive and confidential, leaking information from the data must be avoided. This means that we can only apply, build on and develop techniques with solid privacy guarantees.


A proof of concept of a federated synthetic transaction data generator has been developed and open sourced. Using the strengths of secure multi-party computation, federated learning and differential privacy a synthetic data generator can be trained without sharing any data. This generator is then given to the participants of the protocol. The federated solution is based on the GraphBin solution developed in 2022. This implies that the privacy and utility metrices are also applicable to this solution. To get more insight into these, a pilot and privacy workshop have been organized within this project. The pilot aimed to learn more about the applicability and utility of synthetic data in banking applications. And, finally, the workshop with privacy and data specialist from the financial world has been organized to present and discuss the relevant privacy and utility metrics of (federated) synthetic data in operation.


The technology developed in this project allows the synthesis of transaction graph data offering other institutions the possibility to train new algorithms for the detection of financial crime without having access to real data. The entire financial sector could benefit from such an algorithm. In this project knowledge and experience has been gained on the combination of PETs. In particular, multi-party computation via secret sharing, differential privacy and federated learning have been combined to construct a secure, private, fast and scalable solution.


  • Jins de Jong, Scientist, e-mail: