Expert Speak Digital Frontiers
Published on Nov 29, 2017
Time for Data Collectors to Say “No, Thanks!”

Executive Summary

The easy abundance of personal data online has created a dependence on data and an insatiable appetite for more. Data can provide useful insights, and governments and businesses have legitimate uses for personal information. However, insufficient limits and controls on the amount of data collected and the period for which it is stored raises several concerns. For decades, privacy advocates have warned of the growth in government surveillance leading to a loss of freedom. The aggregation of personal data by multiple businesses adds to privacy concerns because of the volume collected and data breaches that expose this data to criminals. However, not enough attention is paid to the threat that personal data poses to individual financial safety and national security. Personal data can be mined to lure individuals into fraud and to discover secrets that can be used to blackmail or coerce people in high-security positions. Such risks call for a new attitude of data abstention from governments and businesses, wherein these entities selectively collect data and delete it as soon as it has served its purpose. Individuals or groups acting with malicious intent will not be able to use data that does not exist; this improves the security of the public.

In 2012, the director of the Central Intelligence Agency (CIA), David Petraeus, resigned after he was caught having an extramarital affair with a married woman.<1> The affair was discovered because the woman, who had a top-secret security clearance, was cyber stalking and harassing another woman whom she suspected of also having an affair with the CIA director. Petraeus was a retired general who had previously been the commander of US military forces in the Middle East. The cyber-stalking victim, who was also married, appeared to have a close relationship with another high-ranking general, who was a friend of the CIA director and was at that time slated to become NATO’s supreme military commander. The victim called in an FBI friend, who used internet logs and metadata to find the stalker, and accidentally discovered the affair. The FBI agent also sent shirtless photos of himself to the victim.

Setting aside the salacious, airport-thriller novel aspects, this convoluted mess worried national security officials because of the potential it posed for people to be blackmailed or hacked, causing significant damage if some adversarial group discovered it first.<2> Security experts flagged multiple failures in precautions that had led to the discovery of the affair. It helped that there was so much stored personal communication data for the FBI to analyse. However, the same data had made the director—and others involved—vulnerable, and had ultimately jeopardised national security. In today's data-rich and data-hungry world, the best protection against such threats is to limit the amount of data that service providers and governments collect.

The New Oil

When it comes to data, conventional wisdom says that “more is better”. A recent web search by this author for the phrase “data is the new oil” returned over 200 new posts created in the of one week. The implication is that data can be processed to extract value to power the economy. However, it differs from oil in one respect: data is not a limited resource. Each day, more data is created, and it is as easy to access as crude oil was when it first became useful, just bubbling up to the surface and ready for the taking.

Much of the new data produced, or collected, each day is personal information that pertains to people's daily lives, such as what they do, where they go, what they buy and with whom they communicate. People engender massive amounts of data in the online world. When a person visits a website, the internet service provider (ISP) may—and in some places, is legally obligated to—log the time and the site’s server address. If the websites enforce HTTPS, then the ISP can also capture all the data that is exchanged. The server hosting the site, too, often logs the IP address of visitors, the time, the pages viewed and the actions taken. If it is a webmail site, the email service provider can also log when a user signed in and from what IP address, and who sent or received the email; it too stores the contents. Many news and information sites employ tools to log how long a person spends on a page, how far they scroll, what else they click on. If the site uses a third-party analytics tool like Google Analytics, then that's one more entity that captures the web users’ information. Many sites use more than one analytics provider, which increases the number of entities collecting data. If the site makes its revenue from advertising, then it probably hosts a service that serves those ads and also records who views them. Social networks integrated into the site for sharing links may also collect data at a personal level. All in all, anywhere from a handful to a couple of dozen entities can collect and store information about a person’s visit to just one website. One writer counted over a hundred trackers in one and a half days of Web use.<3>

When a person browses the web, there could also be one or more external parties snooping on this activity at any point from the personal device to the remote server. It could be law enforcement or intelligence agencies, foreign governments or criminals and hackers. This further increases the number of people who collect information.

Even offline activities create a data trail, because nowadays, at some point those activities are digitised. Visit a doctor or a hospital and a bill is generated, which creates a record in the office database. Pay by anything other than cash, like a credit card or even a cheque, and a record is created with the bank. The bill incurred may in turn create a record in an insurance company's database. Drive past a license plate scanner on the way home and a record is created with the motor bureau. The mobile network operator may be collecting location data of your mobile phone. Turning on location services or using a navigation app produces another set of personal data.

This personal data can provide a detailed and intimate profile of individuals, which can lead to abuse of privacy and be exploited to harm them socially, financially or physically. Governments, businesses and civil societies are, therefore, grappling with the issue of data protection. Their objective is to find a balance between the legitimate interests of data collectors, and the privacy and security interests of individuals. 

Drill, Baby, Drill

There are three groups of entities that are interested in collecting personal information. The first consists of governments that have lawful jurisdiction over the person, such as a city, county/district, state and national governments. The second group consists of entities that a person transacts with, such as banks, e-commerce sites, libraries and other government services. The third group consists of those who want information to defraud, defame or harm either the person, people connected to them, or their employers or governments.

Governments have significantly ramped up data collection for several reasons that they believe are legitimate interests.<4> These reasons include better delivery of services, responding to demand, preventing terrorist acts, thwarting criminals and catching tax evaders. Most of the increase in government data collection has been in response to terrorism. The Snowden leaks confirmed what many had suspected about the vast breadth and scope of communications data that the US National Security Agency (NSA) collects. Other governments also collect communications data on their own citizens, including the UK,<5> Australia<6> and Germany. India has had a plan in the works for several years to connect multiple government intelligence, law enforcement and citizen-centric databases.<7>

Since communications and internet services store the information, governments can access or request metadata about communications, i.e., the information on who is communicating with whom, and the times and locations of the communications.<8> If the contents of the communications, too, are stored, e.g., mails or instant messages, governments can access that as well. Many governments also want to access encrypted communications, and to do that, they are requesting universal access keys or “backdoors” to encrypted communications.<9>

In the second group, businesses claim a legitimate interest in collecting personal data to better serve existing customers and to sell to new customers. Netflix is known for mining the viewing habits of its subscribers to provide video recommendations. Amazon does the same for Audible subscribers, in addition to offering product recommendations based on searches conducted on its e-commerce platform. Businesses have been dazzled by the allure of using data and the internet to specifically target potential new customers. Their hope is that by acquiring enough information on individuals, they can identify those most likely to consume their products and craft a message directly for them. This technique, known as microtargeting, has been used in the US presidential elections.<10> The advertising industry has also hyped up the potential of repeatedly targeting individuals with product messages based on their internet activities. It goes something like this: A person reads an article about an idyllic vacation spot on a news or travel website. Later, they see an ad on a different site for a vacation package to that spot, and then they see the ad again on another site. This “retargeting” is supposed to prompt the person to buy the package and off they go.

However, business data collection goes beyond individual companies looking at their own customers. Companies called “data brokers” aggregate personal information from different sources to create comprehensive profiles of individuals. They then sell this information to interested parties. This type of business data collection has reached a scale that is difficult to grasp. A report by Cracked Labs says that large data brokers such as Acxiom and Oracle collect thousands of data points on millions of users globally, and can provide information such as age, gender, income, religion, sexual orientation, vehicles and property owned, number and age of children, hobbies, political affiliation, whether seeking an abortion, travel history, social media usage and much more.<11>

In 2009, the legal director for the Electronic Frontier Foundation (EFF), Cindy Cohn, called the commerce of personal information the “surveillance business model.”<12> This brilliantly apt term is a reminder that governments, too, can purchase private information from data brokers. This includes foreign governments, which may not have the means to set up their own data-collection system in the host countries.

The third group that seeks access to personal information includes scam artists, cyber criminals, cyber-espionage groups and adversarial governments. They each have different motivations, and depending on their goal, they may use personal data differently.

Financial scammers can use personal information to dupe people into giving their money to supposedly trusted entities. In 2016, Indian authorities broke up several call-centre operations that duped US citizens out of millions of dollars. The callers from India pretended to be Internal Revenue Service (IRS) agents and threatened the victims with huge fines and arrest if they did not immediately make a payment, supposedly to the IRS but using unusual methods, such as iTunes gift cards. In one news report, a treasury department official said that the callers “have information that only the Internal Revenue Service would know.”<13>

The Equifax data breach that came to light in 2017 exposes financial and personal details such as bank accounts, and employment and address history of millions of residents of the US and the UK.<14>

Government databases are also prime targets for cyber-espionage groups and other governments. The US government disclosed that a Chinese group had collected personal information of nearly 20 million people who had applied for a government security clearance, and almost 2 million relatives of such applicants, by breaking into a system at the Office of Personnel Management (OPM).<15> The hackers could either be non-state actors or a government-linked group.

If an entity combined the OPM and Equifax data, they could learn which security clearance holders had financial difficulties. Such information would be immensely valuable to foreign governments. Since one of the main justifications that governments use for mass data collection is protecting the citizens, it makes sense to also consider the potential for harm that this data creates.

The Harms of Data

The concern with government mass data collection and backdoors is that if one entity wants to use the data for beneficial reasons, many others want the data for malicious purposes. Suppose a government succeeded in convincing a messaging app to give it the key to unlock all encrypted communications. Even if this government manages to keep that key secret against all external parties, the app creator could come under tremendous pressure from other governments seeking the same access. If the company wants to do business in those countries, it will have to give those governments access as well, if so asked. Now, rather than just one government protecting one key, there may be dozens of governments protecting that one key, or if they each have their own key, protecting dozens of keys. In either case, as the number of people with access increases, so does the probability of the key leaking to the wrong people. Creating a system that permits such access may also make it vulnerable to cryptographic attacks.

Democratically elected representative governments conduct mass data collection because not enough citizens object to it. Many people say they are not bothered by the invasion of their privacy because they have nothing to hide. This attitude misses the point. They have nothing to hide at the moment, because they either have not engaged in an activity they want to hide or the authorities have not made their habitual activities into an offense. Moreover, while some people may not have anything to hide, others might, and their secrets make them vulnerable to blackmail or coercion. If the person being blackmailed has a sensitive role in protecting national security, critical infrastructure, financial systems or other valuable targets, then the entire population, including those who have no secrets to hide, is at risk of harm.

Data collection leads to data processing. When the government has millions or billions of records, making sense of the records requires additional processing. Experts have pointed out that such systems generate thousands of false leads. Artificial intelligence may reduce the number of false leads, but the “reasoning” behind AI results in a black box, so agents may end up on a suspect's doorstep with no idea why the person is a suspect. Mass data collection can thus erode freedom and undermine people's trust in democratic governments. Even hardened cynics need to consider the long-term effects and not dismiss these concerns lightly.

Given the potential for harm, it is worth exploring how much benefit mass data collection provides. Despite the billions of records that the US government has collected, it has failed to show that personal data helps prevent acts of terrorism, according to EFF. By the FBI's own narrative, one wannabe terrorist who was arrested in 2016 came to its attention because of a concerned community member who was an FBI source.<16> The FBI then built its case through conventional police work, not through data mining.

Similarly, after terror acts, the government requests for “backdoors” to messaging systems are not based on evidence that such access can prevent terror acts. Backdoor access will, therefore, only weaken a secure system that hundreds of millions of people use for legitimate, lawful purposes. This is one reason that over 200 technology industry leaders and privacy researchers have written a joint letter stating that backdoors will make everyone less safe.<17>

The collection of personal information by businesses poses many of the same risks as government surveillance does. To begin with, while every business probably collects some data, there are a few businesses that collect a huge amount of personal data, and that's excluding the data brokers. First, there is Amazon, which lists at least 30 different businesses that it operates on the web. Then there are Facebook and Google, who dominate online advertising.<18> These companies build detailed user profiles that they then use to provide tailored content and ads.

These profiles can be used to target individuals and lure them into fraudulent situations. In 2013, a Facebook user named Adam decided to explore the extent of individual targeting possible.<19> He set up an ad campaign on the site with the goal of getting a friend of his to click on the ad, using the friend's demographic information, such his university, past employer and favourite video game. Within half a day of launching the campaign, the friend clicked on the ad. The experiment cost Adam nothing since the ad only received two clicks in total. Armed with a database of user profiles, malicious actors could use this technique to lure high-value targets to defraud them or to install malware in their systems. Since such ad campaigns can be created using algorithmic means, they can target thousands of people easily and at a very low cost. During the 2016 US election campaign, several scammers targeted Republicans and supporters of Donald Trump with ads to get them to donate to fake political groups. <20>

Big data in the service of commerce can also be used to subvert individual privacy and rights in potentially harmful ways. In March 2017, journalist Ashley Feinberg discovered ex-FBI director James Comey's undisclosed and pseudonymous Instagram account by searching for his son's not-so-anonymous account.<21> Once she was following the son’s account, Instagram suggested other accounts for her to follow, one of which turned out to be Comey's. Such suggestions are part of most social networking sites, but as the algorithms meant to make the suggestions more useful get more complicated, nobody can say how they work.

Although these examples had fairly benign outcomes, in other cases, they could cause grave harm. Victims of domestic violence or stalking could inadvertently be exposed. So could political refugees and dissidents. In countries that outlaw homosexuality, data could be used to find and persecute closeted individuals.

While some people may find it appealing to receive individually targeted promotional messages based on their actions, others may find it intrusive. The bigger question is whether it works. There does not appear to be much hard data on this. One paper published by MIT researchers in 2011 found that retargeting had mixed results.<22> It works less well than general advertising when people do not have intent to purchase, and works better when they are actively shopping.

The third group that wants to get personal information can claim no legitimate interest to it. Therefore, to get data, hackers and criminals break into systems or gather data that's not secured. This article has touched upon how personal information in the wrong hands can be harmful, but it is worth looking at in more detail. The risks are financial fraud attacks against individuals, financial theft against centralised systems, and physical threats against individuals and national security threats against governments.

In the US, when a person calls their bank or credit card company, it is highly likely that to establish their identity, the financial institution will ask them some questions based on their personal history. One bank makes it a point to inform the customer that this information is coming from public sources. The questions range over areas such as previous employers, past addresses or phone numbers, mortgage history and relatives who may have shared those addresses, phone numbers or accounts. Though this will deter a spur-of-the-moment criminal, a more determined effort could unearth most of those details from public records. Many people make their career history available on LinkedIn and family connections on Facebook. Mortgage records are public filings, which are increasingly searchable online. Thus, these questions provide no real security, especially not in a post-Equifax breach world, where these details are out in the wild.

In the wrong hands, the personal information that brands, social networks and internet data brokers collect can be used against individuals by targeting them in the same way Adam targeted his friend. The goal may perhaps not be to harm the individual, but the individual's employer. It may have taken only one employee for HBO's systems to be breached; an NSA contractor unwittingly exposed the agency's hacking tools to the Russian government. The other risk arises from one of the side effects of large data collection, which is that no one can predict how that data will be used or what results the data analysis may produce. This can lead to people being exposed to unpredictable risks.

Members of the UK government think that backdoors will help them against criminals and terrorists, but this is a short-sighted view, since terrorists can use backdoors as well, putting even more people at risk. If terrorists can intercept communications of government officials, high-profile businesspersons and their families, they can know when and where their targets are most vulnerable.

Turn Data to Wind and Solar

If data is the new oil, then just as the world learned to curb its consumption of oil for the sake of self-preservation, it now needs to curb its collection of data. This means that while data may still be collected and used, the attitude towards it needs to change. Instead of oil, data should be treated like solar or wind energy. It should be used immediately, or stored for a short time, but then it should be gone.

At present, organisations seem to collect data for the sake of collecting it, even when it is unnecessary. For example, a local library in the UK requires online applicants to provide their gender, which has no obvious utility in letting people borrow books. The field is there because it costs nothing, and people provide the information without question.

Figure 1: Online Library Membership Application Requires the User to Provide Their Gender

Privacy advocates and rights groups have been raising the alarm over the unbounded collection of personal information for decades. In 1971, Arthur R. Miller, law professor at University of Michigan, testified before a Senate subcommittee:

“Few people seem to appreciate the fact that modern technology is capable of monitoring, centralising, and evaluating these electronic entries—no matter how numerous they may be—thereby making credible the fear that many Americans have of a womb-to-tomb dossier on each of us.”<23>

Yet, many privacy advocates take the need to collect data as a given, and mostly recommend stronger protections in terms of limits on how governments and businesses may use the information.

Personal data has no natural scarcity, leading to unnecessary collection and abuse. To counter this, governments must impose scarcity by limiting what data may be collected, controlling how long it may be stored, and requiring data to be deleted. Going further, people need to adopt an attitude of abstention from collecting data, just as the world learned to become less dependent on oil.

The German term “datensparsamkeit”<24> captures the idea of collecting as little data as necessary. It translates to English roughly as “data frugality” or “data abstention.”<25> The EFF has advocated such an approach for many years. However, the amount of data collected has continued to grow. As AI and machine learning are increasingly put to data-mining use, and people talk of putting personal data into the blockchain, which are meant to retain data permanently, the need to limit collection becomes more imperative.

A library card system designed with data abstention in mind will not ask for gender data because it is not necessary. Law enforcement officials with data abstention in mind will not collect biometric information from everyone they arrest.<26> Data-frugal governments will prohibit ISPs from collecting user access logs or require them to purge the logs after a brief period, say a week or one month. Counterterrorism officials with a data-frugality mindset may approach the problem by building community relations and developing informants rather than enormous systems that collect and fruitlessly analyse billions of records on millions of harmless citizens.

Businesses with a data-abstention mindset will purge their website access logs after a few days. E-commerce companies may delete purchase histories after some meaningful amount of time, such as the return period or after the warranty expires. Combining the concept of data abstention with user consent, data brokers should only be allowed to collect information from people who agree to it, and only the information they allow.

Of course, what constitutes necessary amounts and duration is open to interpretation. The onus should fall on the collector to periodically evaluate the information that is collected, and demonstrate to themselves and external review bodies that the data is essential and serves some useful purpose. If they must store personal data, data collectors should also find ways to first remove as much personal information as possible so that the data alone cannot be used identify individuals.

Current approaches of security hardening may protect systems for some time, but as the steady rate of breaches shows, systems and processes are not failsafe. That does not mean systems should not be secured. On the contrary, more efforts, investment and vigilance are needed. Fraud and intrusion detection will need to occur in real time.

Like offline criminals breaking into a building, cyber criminals are often working against time. They want to get into a system, get the data and get out quickly. Eliminating data troves while also increasing security measures reduces the incentive for criminals to break into systems. To collect data, they would need to break into a system and then establish a persistent connection. Such connections increase their chances of being discovered.

A data-frugal attitude does not mean turning one's back on technology or even data use. Rather, it calls for a more rational and purpose-driven use of data. Imagine standing in front of an all-you-can-eat buffet every day. One may either consume large quantities of everything, or be selective, choose some items and leave others, count calories and eat sensibly. It may be fun to binge eat, but it is unhealthy and ultimately detrimental.


Endnotes

<1> Spyfall, http://swampland.time.com/2012/11/15/spyfall/2/.

<2> Joe Sterling, “Is Petraeus pillow talk a security threat?” CNN, 14 November 2012, http://edition.cnn.com/2012/11/13/us/petraeus-security-threat/index.html.

<3> Alexis C. Madrigal, I'm Being Followed: How Google—and 104 Other Companies—Are Tracking Me on the Web Feb 29, 2012  https://www.theatlantic.com/technology/archive/2012/02/im-being-followed-how-google-151-and-104-other-companies-151-are-tracking-me-on-the-web/253758/

<4> Mike Masnick, “The US Government Today Has More Data On The Average American Than The Stasi Did On East Germans,” TechDirt, 3 October 2012, https://www.techdirt.com/articles/20121003/10091120581/us-government-today-has-more-data-average-american-than-stasi-did-east-germans.shtml.

<5> Simon Phipps, “Should the government be allowed to collect data on UK citizens to prevent terrorism and criminal activity?” British Library, 28 January 2015, https://www.bl.uk/my-digital-rights/articles/mass-data-collection.

<6> Murray Hunter, “Australia's Domestic Spying Revealed, Geopolitical Monitor,” Geopolitical Monitor, 11 November 2013, https://www.geopoliticalmonitor.com/australian-domestic-spying-revealed-4882/.

<7> Afshan Yasmeen, “Natgrid will deter terror: Shinde,” The Hindu, 15 December 2013,

http://www.thehindu.com/news/national/karnataka/natgrid-will-deter-terror-shinde/article5460766.ece.

<8> “How Much Meta Data Does Your Country Collect?” TorGuard, 8 March 2017,

https://torguard.net/blog/how-much-meta-data-does-your-country-collect/.

<9> Emma Woollacott, “UK Home Secretary Demands WhatsApp Backdoor From People Who 'Understand The Necessary Hashtags',” Forbes, 26 March 2017, https://www.forbes.com/sites/emmawoollacott/2017/03/26/uk-home-secretary-demands-whatsapp-backdoor-from-people-who-understand-the-necessary-hashtags/#78adefca2df4.

<10> Tanzina Vega, “Online Data Helping Campaigns Customize Ads,” New York Times, 20 February 2012, http://www.nytimes.com/2012/02/21/us/politics/campaigns-use-microtargeting-to-attract-supporters.html.

<11> Wolfie Christl, “Corporate Surveillance in Everyday Life,” Cracked Labs, June 2017,

http://crackedlabs.org/en/corporate-surveillance.

<12> Noam Cohen, “As Data Collecting Grows, Privacy Erodes,” New York Times, 15 February 2009, http://www.nytimes.com/2009/02/16/technology/16link.html.

<13> Charles Riley and Omar Khan, “India busts bogus call centres for posing as the IRS,” CNN Money, 6 October 2016, http://money.cnn.com/2016/10/06/news/india-irs-scam-arrests/index.html.

<14> Karen Turner, “The Equifax hacks are a case study in why we need better data breach laws,” Vox, 14 September 2017, https://www.vox.com/policy-and-politics/2017/9/13/16292014/equifax-credit-breach-hack-report-security.

<15> Mike Levine and Jack Date, “22 Million Affected by OPM Hack, Officials Say,” ABC News, 9 July 2015, http://abcnews.go.com/US/exclusive-25-million-affected-opm-hack-sources/story?id=32332731.

<16> “Inside the Mufid Elfgeeh Investigation,” FBI, 16 May 2016, https://www.fbi.gov/news/stories/inside-the-mufid-elfgeeh-investigation.

<17> Andrea Peterson, “The debate over government ‘backdoors’ into encryption isn’t just happening in the U.S.,” Washington Post, 11 January 2016,

https://www.washingtonpost.com/news/the-switch/wp/2016/01/11/the-debate-over-government-backdoors-into-encryption-isnt-just-happening-in-the-u-s/.

<18> Sierra Webb, “Google & Facebook Dominate US Ad Share Market,” StrataBlue, 3 November 2017, https://stratablue.com/google-facebook-dominate-share-market/.

<19> Adam Smith-Kipnis, “Does Targeted Advertising Work? My campaign to earn one click,” AdamSonic, 12 August 2013, http://www.adamsonic.com/blog/?p=503.

<20> Maegan Vazquez, “Scam PACs Target Conservatives with a 'Dinner With Trump’ Message on Facebook,” International Journal of Research, August 2016, http://ijr.com/2016/08/682420-scam-pacs-target-conservatives-with-a-dinner-with-trump-message-on-facebook-read-the-fine-print/.

<21> Ashley Feinberg, “This Is Almost Certainly James Comey’s Twitter Account,” Gizmodo, 30 March 2017, https://gizmodo.com/this-is-almost-certainly-james-comey-s-twitter-account-1793843641.

<22> Anja Lambrecht and Catherine Tucker, “When Does Retargeting Work? Information Specificity in Online Advertising,” MIT, 2 December 2011, http://ebusiness.mit.edu/research/papers/2011.12_Lambrecht_Tucker_When%20Does%20Retargeting%20Work_311.pdf.

<23> John Bellamy Foster and Robert W. McChesney, “Surveillance Capitalism,” Monthly Review 66, no. 3 (July­–August 2014), https://monthlyreview.org/2014/07/01/surveillance-capitalism/.

<24> Translated from German by Dr. Axel Harneit-Sievers, the Heinrich Böll Foundation.

<25> Martin Fowler, Datensparsamkeit, 12 December 2013, https://martinfowler.com/bliki/Datensparsamkeit.html.

<26> Valerie Ross, "Forget Fingerprints: Law Enforcement DNA Databases Poised To Expand," PBS.org,  02 Jan 2014, http://www.pbs.org/wgbh/nova/next/body/dna-databases/.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.