Google’s Messages and Phone apps collect and send user data to the company’s servers without user consent, potentially violating privacy laws like Europe’s GDPR.
The claim comes from Douglas Leith, a computer science professor at Trinity College Dublin. In a paper titled “What Data Do The Google Dialer and Messages Apps On Android Send to Google?” Leith outlined what data these apps send to Google.
The apps collect information about users’ communications, including a SHA256 hash of messages and their timestamp (hashing is a process of scrambling information so it can’t be returned to its original form), phone numbers, incoming and outgoing call logs, call duration, and call length.
The information is sent to Google using Google Play Services’ Clearcut logger service and through Firebase Analytics. Moreover, the data helps Google link the message sender, receiver, or the two participants in a call.
Although Google only receives a 128-bit value of the message hash, Leith says it could be possible to reverse the hash and reveal the contents of short messages.
“I’m told by colleagues that yes, in principle this is likely to be possible,” Leith told The Register in an email.
“The hash includes a hourly timestamp, so it would involve generating hashes for all combinations of timestamps and target messages and comparing these against the observed hash for a match – feasible I think for short messages given modern compute power.”
Leith’s paper also outlines that Google’s Phone and Messages apps don’t feature privacy policies to explain what data they collect, despite Google requiring third-party apps on the Play Store to include privacy policies. Moreover, users who download their data from Google Takeout won’t receive the Messages and Phone information collected by Google.
Considering the Phone and Messages apps are installed by default on millions of Android devices, it’s a massive oversight and significant invasion of privacy by Google.
Leith detailed his findings to Google in November 2021 and detailed nine steps the company should take to rectify the problem. Google has already made (or plans to make) changes, which you can find them below:
- The specific data collected by Dialer and Messages apps, and the specific purposes for which it is collected, should be clearly stated in the app privacy policies.
- Data on user interactions with an app, e.g., app screens viewed, buttons/links clicked, actions such as sending/receiving/viewing messages and phone calls, is different in kind from app telemetry such as battery usage, memory usage, slow operation of the UI. User’s should be able to opt-out of collection of their interaction data.
- User interaction data collected by Google should be made available to users on Google’s https://takeout.google.com/ portal (where other data associated with a user’s Google account can already be downloaded).
- When collecting app telemetry such as battery usage, memory usage etc., the data should only be tagged with short-lived session identifiers, not long-lived persistent device/user identifiers such as the Android ID.
- When collecting data, only coarse time stamps should be used, e.g., rounded to the nearest hour. The current approach of using timestamps with millisecond accuracy risks being too revealing. Better still, use histogram data rather than timestamped event data, e.g., a histogram of the network connection time when initiating a phone call seems sufficient to detect network issues.
- Halt the collection of the sender phone number via the CARRIER_SERVICES log source when a message is received, and halt collection of the SIM ICCID by Google Messages when a SIM is inserted. Halt collection of a hash of sent/received message text.
- The current spam detection/protection service transmits incoming phone numbers to Google servers. This should be replaced by a more privacy-preserving approach, e.g., one similar to that used by Google’s Safe Browsing antiphishing service, which only uploads partial hashes to Google servers.
- A user’s choice to opt-out of “Usage and diagnostics” data collection should be fully respected, i.e., result in a halt to all collection of app usage and telemetry data.
Google’s (planned) fixes
- Halting the collection of the sender phone number by the CARRIER_SERVICES log source, of the 5 SIM ICCID, and of a hash of sent/received message text by Google Messages.
- Halting the logging of call-related events in Firebase Analytics from both Google Dialer and Messages.
- Shifting more telemetry data collection to use the least long-lived identifier available where possible, rather than linking it to a user’s persistent Android ID.
- Making it clear when caller ID and spam protection is turned on and how it can be disabled, while also looking at ways to use less information or fuzzed information for safety functions.
It’s also worth noting that Google confirmed to The Register that Leith’s paper was accurate and provided explanations for some of the data collection practices. The company said it collects message hashes to detect sequencing bugs, while phone number collection is intended to help improve the automatic recognition of one-time password (OTP) codes sent over SMS. Meanwhile, Firebase Analytics logging is used to measure whether people use the apps after downloading them.