Understanding the privacy-sensitive aspects of speech data

Problem Context

The introduction of deep learning models has accelerated the application of language and speech technologies, for instance to automate customer services or create transciptions of meetings. However, speech technology solutions for the Dutch language are not performing well enough to make a positive impact on operations in many Dutch organizations (at least for NPO and CZ). Since speech technology is based on deep learning models, there is a need for larger datasets of speech data, and this provides a challenge. Due to both privacy-related concerns and copyright issues, it is not possible to share these speech datasets across silos or even organizations.


Currently, there exists no (market) solution for the safe sharing of speech data specifically, taking into account the requirements of Dutch organizations (keeping data on-premise, not sharing it via a non-Dutch/European third-party, taking into account values such as transparency and explainability). Current privacy-enhancing technology can offer solutions for safe data sharing, but it is not clear yet if these techniques can also be applied directly to speech data. Therefore, TNO is exploring technological solutions and developing proof-of-concepts for the safe and privacy-preserving sharing of speech data specifically.



  • Lizette Maljaars, Senior Project Manager, e-mail: