Privacy in a Pandemic - Contact Tracing Edition
Updated: May 13, 2020
(DISCLOSURE: I am working with a team developing a public health risk exposure tracking system.)
A number of projects have emerged to help people and governments track COVID exposure risk by repurposing common consumer electronics as geolocational and proximity tracking devices.
Many of these approaches involve using mobile phones' Global Positioning System (GPS) to track location/travel and Bluetooth connectivity to track proximity to other people—or, at least, their phones.
However, many of the groups, agencies, and organisations developing these solutions have not adequately considered or addressed the long-term implications of the applications they're developing and the data that they're collecting. Though some have responded to the concerns of privacy advocates by giving users more control over the data that’s collected or how it’s stored, many have added more privacy-focused features in a post-hoc manner that amounts to deleting sensitive information rather than designing systems to maxmise privacy in the first instance.
In order for the applications to function, they need to strike a balance between absolute privacy and anonymity, and the ability to identify individual users based on their location and/or the devices to which their devices have been near. If you have complete anonymity, then you would find serious difficulties in alerting individual users when they have encountered an exposure risk.
Furthermore, depending on the design architecture, you can toggle the levers and switches in such a manner as to increase privacy at the cost of computational efficiency. Imagine that you are using Bluetooth as a means by which to log potential exposure. Three people walk 1 KM from their home to the store and back. Every time each person’s phone comes within a certain distance of another person using the application, their devices exchange a token as a sort of handshake. So, each person walks closely past 20 other people using the application, so each person now has 20 different tokens on their phone, and 20 other people now each have a token on their phone representing the encounter each had with Persons A, B, and C. How should we best implement this solution?
Let’s limit our options to one of the following: (let's just assume that Persons A, B and C are indistinguishable from the one mobile device that each owns, and that each person’s device device never loses connectivity or runs out of battery)
•Person A is given one unique token when Person A signs up for the application/service
The issue here is that the service will be able to collect every single proximal interaction that Person A has had during the time period during which he has been using the application. While this would make it much simpler to notify people of potential exposure, it creates serious privacy concerns for Person A since the application or service will have a very thorough record of the people with whom Person A has come into contact.
•Person B is given one unique token per a set time interval (such as 1 week, one day, or one hour)
While the shorter intervals here appear to create a more enhanced privacy protection than one token assigned during signup, they actually just create a higher computational burden on the system while storing all the same proximal information about a user's encounters that remains susceptible to the same concerns as Option 1. Person B still exchanges tokens when she passes closely by another person using the application. The server administrators can still track all the geolocational and contact data for the user, and, in the event of a breach, a malicious actor can still match all the relevant data to a user—albeit in a shorter interval.
•Person C generates a unique token every time she comes into proximity with another person using the application or service.
The unique token generation presents greater security than the other two options but creates a great computational burden to reconcile and store which tokens were associated with which user at which encounter. Further, you would need to factor in which encryption strategy would be the most appropriate.
No encryption is uncrackable. As opposed to steganography, which refers to the process of hiding or concealing information to prevent non-intended parties from seeing it, encryption presumes a likelihood or risk of interception or theft. The purpose then becomes muddling the data to prevent it from being intelligible to the non-intended recipients in a timeframe during which the data remains sensitive. For instance, military operations no longer run the risk of failure due to exposure 25 years after the war’s conclusion.
Generally speaking, there are a number of companies that already can collect and/or access our geolocational data: maps and navigation, rideshare, restaurant finders, fitness loggers, and even some flashlight apps! However, these “contact tracing” applications strike many as somewhat more invasive than mere geotracking applications because they can also record with whom we come into contact.
Critics point out that many common applications and devices already track all sorts of information about us, so, they argue, we should not express concern over apps designed to help monitor or track COVID-19. However, the current crisis has seen many jurisdictions pass legislation allowing for—and, in some instances, requiring—sharing with government agencies and instrumentalities of the data. Many fear the potential for abuse by government and granular surveillance capabilities
In addition to these complications, issues emerge with how this information is to be exchanged and processed. For instance, would the handshake tokens be exchanged only between devices, or would they be synchronised with a cloud-based server and later used to reconcile token matching in the event that a user reports exposure or symptoms? And, each application would only interface with itself on other devices, so some people may download a handful of applications to ensure coverage which could lead to battery drain, interference, and other complications.
While I don’t have all the answers, I raise these issues to illustrate that as many different teams endeavour to build tech solutions to help people better manage the ongoing public health crisis caused by COVID-19, I still have significant concerns over the long-term impact on people’s privacy.
As a user, if you’re given a choice as to which application you may download, you should consider a number of factors, such as how widely the application is used in your area, how secure the application is, how long the data is stored, what sort of encryption is used, who else has access, and what your legal data privacy rights are—among other things. While most of us lack the access, knowledge, or expertise to evaluate these things on our own, we can rely on reports from think tanks and public policy organisations made up of experts who evaluate and report on these issues.
I tend to rely on a handful of indicators when comparing these kinds of projects. For instance, I try to determine whether the application’s publisher is a nonprofit or academic institution, whether they claim to store data in an encrypted fashion, and whether they have decided to release the software as open source. You may have your own criteria, but I think these are a good place to start.