Google Discusses Safeguarding Speech Data After Leak



Tim Verheyden, a journalist with Belgian public broadcaster VRT, gained access to more than 1,000 audio files from a Google contractor. The contractor was part of a workforce paid to review some audio captured by Google Assistant, smart speakers, phones, and security cameras.

While most of those recording were intended (for example, people asking for weather data), others were not. In about 150 of the recordings, Google Assistant appeared to have activated incorrectly after mishearing its wake word.The audio captured includes private conversations.

Today, Google posted information on The Keyword blog about their processes to safeguard speech data. In it, Google acknowledges the leaked Dutch audio data.

We just learned that one of these language reviewers has violated our data security policies by leaking confidential Dutch audio data. Our Security and Privacy Response teams have been activated on this issue, are investigating, and we will take action. We are conducing a full review of our safeguards in this space to prevent misconduct like this from happening again.

Google admitted that language experts review and transcribe a small set of queries to help Google better understand those languages. Part of the blog post involves Google explaining how to activate Google Assistant and insisting that devices that have Google Assistant built in “rarely” experience a “false accept”.

To me, it feels like Google is trying to direct people’s attention to the language reviewer who leaked some of the “rarely” recorded speech after a “false accept”. Google is trying to blame the messenger (the language reviewer and/or the VRT broadcaster).

In doing so, Google is trying to deflect attention away from its lack of responsibility with the voice data Google Assistant records. The leak of the unintentionally recorded speech data makes it clear that Google is recording, and keeping, a whole lot of audio that people never intended Google Assistant to grab. That’s not ok.

That said, Google explains that it will provide users with tools to manage and control the data stored in their account. You can turn off storing audio data completely, or choose to auto-delete data every 3 months or 18 months. But, how will we know, for certain, that Google isn’t keeping a copy for itself?


Leave a Reply

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.