Applying privacy-enhancing technologies to speech data so more powerful models for automatic speech recognition can be developed
Since the introduction of deep learning models, applications of language and speech technology have skyrocketed. More and more organizations see opportunities in further automating customer services or automatically transcribing conversations, meetings, programs and presentations. All of this is made possible by training AI on large and diverse datasets. This data is available, but is spread over different organizations and cannot be aggregated and shared for a variety of reasons: Speech is the acoustic counterpart of a fingerprint, contains all kinds of indicators of sensitive information about the speaker such as gender, age, and medical conditions, and in addition, the voice data itself may contain classified information. Moreover, different (speech) datasets may not be shared across organizations, because this data may only be used in-house due to copyright or other property rights.
The many developments in the field of privacy-enhancing techniques (PET) make it possible to use data together efficiently and securely without actually being able to view each other's data. This project investigates the possibilities of using these PET techniques to use speech data from different organizations in order to eventually develop a Dutch speech model that can not only handle Standard Dutch well, but also handle jargon, jargon, accents and dialects well.
We collaborate with NPO, CZ and the Dutch Speech Coalition.
- Saskia Lensink, consultant language and speech technologies, e-mail: email@example.com