Privacy by Design


“The core of Horizon,” says Derek McAuley, “is trying to understand how to bring the principles of privacy-by-design into play to get value out of a lot more personal data becoming available through the Internet of Things and everything else. How can we provide value to people without having to give all this data to somebody else, Horizon’s director, is professor of digital economy in the department of computer science. As a corollary: “We have to create real services that give real value in a real way.”

McAuley’s thesis has three component parts: first, the basic assumption that the public has an interest in protecting individual privacy; second, that the future of digital services cannot realistically be funded solely be advertising; third, that both of these can be served by rethinking how we design system architecture.

The last couple of years – revelations from Edward Snowden’s cache of documents about NSA spying, plus greater public understanding of data-driven companies such as Google and Facebook – have made plain the extent to which the Internet has become, as security expert Bruce Schneier puts it, “a giant surveillance machine”. The Internet of Things, which is a catch-all term for network-connected devices and sensors, threatens to bring the same approach to the physical world. Already today most cities are densely populated with CCTV cameras; now imagine chips in every lamp post on the street and every appliance and control in your home, all collecting data on every interaction and uploading it to the cloud. The push in this direction has already begun; early in 2014, for example, Google bought Nest, the maker of networked electronic thermostats.

The shift to the cloud began with email and continued with social networks. McAuley calls the companies providing these “free” services “multi-sided markets”: services with two or more types of customer who are offered different types of relationships. In the case of companies like Google, Facebook, Yahoo! and the rest of the advertising-supported digital economy, there are two such classes, advertisers (who pay) and consumers (whose data form the product sold to advertisers). Credit card companies and retailers like Amazon and eBay also are multi-sided markets, but they provide a way to match buyers and sellers, both of whom pay for a part of the service.

“The key problem at the moment is that a lot of the digital economy is based on multi-sided markets where the service is free to one customer and the price of that is personal information as a commodity.”

For McAuley, the question we began with, how to create services that create real value for all sides, is the important one: “We have to figure it out because we cannot continue to rely on the multi-sided market that Google, Facebook, and others run. There is only a small amount of planetary economic value that can be exercised that way.”

Twenty years ago, the dominant business model for the computer industry was outright sales: you bought software (or at least, the licence to use it) and the copy of the software and any data you generated remained on your own system. Today, the dominant model is the cloud, with software provided as a free or subscription service. The result is that even a wearable device to track your physical activity or a digital bathroom scale automatically uploads the data it collects to a third-party site outside of your control that presents the data back to you in some form you may find useful.

Yet, as McAuley points out, these and other digital economy services typically ask for far more data than they actually need to do their job. One of his examples is the difference between the incoming smart meters and the market comparison sites like those available to help choose insurance policies or energy suppliers. Smart meters, which will tie half-hourly readings to customer names and addresses, are far more invasive, where today’s market comparison sites work on an email address (which may be a throwaway), plus a bit of regional demographics and your estimate of your current energy consumption. 

“You don’t need smart meter data or the GPS coordinates of your vehicle every moment,” he says. “Many companies saying they have to have all your details are, I think, being duplicitous – they’re not saying they’re serving a different market.”

For that reason, “When we set up Horizon, it was our mission to set up services without oversharing information before there was some event where society decided it was all terrible and there was a big backlash. Societies in general – and the UK may be worse than others – turn on particular pivotal events. In the UK, it’s typically the death of a teenaged girl.” Snowden’s revelations are – or ought to be – one such event. “One of the things that will happen is that EU regulation will come to bear.”

The difference between these two types of architecture, McAuley says, is the difference between privacy and confidentiality. In the first case, confidentiality, you upload the data into the cloud, and “put encryption and armed guards around it”. But that gives you no transparency into who’s viewing the data or what’s being done with it. Worse, the architecture requires multiple third-party service providers, none of whose terms and conditions are comprehensible to the average household. A related paper, by Ewa Luger, Stuart Moran, and Tom Rodden, studies this problem, concluding that what’s needed is an app that, rather than cloaking complexity in apparently accessible language, will reveal the true complexity of what users are being forced to accept.

The second case provides privacy. “It’s a completely different trust model. For sensors in the home you need an applications platform in the home because the reason to deploy this technology is that it does a hundred different things. The same PIR sensing and platform which provides an application for looking after you when you are elderly, can feed into an application today that asks why don’t you change the heating schedule on the boiler because you only use that room at weekends.”

McAuley admits that designing this way – compartmentalisation, data minimisation – is “a bit harder” than shipping everything over to the cloud. But he sees it as essential going forward.

McAuley is also sceptical about some of the recent development of cloud-based personal data stores. The goal of these – to grant the individual more control over their personal data – is similar to McAuley’s, but he believes they raise security issues that are difficult to solve for the diverse data types envisaged in the Internet of Things. Just as importantly, it will be difficult to convince privacy advocates those problems are solved.

“Currently, they require an unbelievable amount of trust in the entity holding the data.” By contrast, “Today you would have to break into myriad places to obtain the same data – maybe we should be considering a more distributed solution.”

McAuley stresses that there are multiple options for system architecture. “There is no one solution to the world’s problem in this space,” he says. The solutions do, however, make a relatively small set, each of which needs to be deployed in the right context. He hopes to be able to create design templates so that it’s clear which ones are suitable for which sorts of services.

“And then people can be comfortable and security experts can say yes, and they can reassure the public that it’s goodness rather than evilness,” he says. Such an optimist.

Privacy by design

Science fiction – and marketing fantasies – tends to assume that the future arrives all at once, so that one day everyone has a disconnected home filled with old wood furniture and the next everyone has a highly connected one filled with smart electronics. The < Home Hub of All Things project begins with the presumption that reality doesn’t arrive like that; instead, it comes piece by piece and step by step, because people simply do not go out and replace everything they own all at once. As William Gibson has remarked, “The future is already here. It just isn’t very evenly distributed.”

Home Hub accordingly aims to get away from current prototype “smart homes” that assume that people will buy entire new systems and be willing to share all their Internet of Things data with unknown, unspecified third parties. Instead, the project will build prototype applications that run within the home network, minimising the amount of data that leaks out, in conjunction with partners such as Dyson, GlaxoSmithKline, and the digital payment service Droplet.

The project’s prototype hub is built upon an extended domestic router that can collect a diverse range of information. To explain how it works, McAuley cites a common example: the desire to monitor elderly people in their homes to ensure they’re safe and well. The currently standard idea for this is to install passive infrared sensors that stream a constant flow of data to a cloud service somewhere that a third party can monitor. In McAuley’s plan, “It can run in the home, and when there’s anomalous behaviour it can communicate with the rest of the world. It does not need to constantly stream.”


The Dataware project is a framework for exploring the psychology behind the ownership and use of personal data. Says McAuley to explain this project, “Rather than moving the data, we would send queries or code fragments to where the data is and what is shared is the result of that query or fragment.” For example, instead of asking for – and sending – all a person’s bank statements, a typical query and response might reveal just the monthly spend on electronics.

“People have forgotten about the idea of sending the code to the data instead of the data to the code,” he says.