Infrastructure for the Masses

“Infrastructure for the masses,” says Richard Mortier, by way of explaining what it is he’s trying to create. Mortier, a Horizon Transitional Fellow in Computer Science, is a member of the core team, based in Nottingham and Cambridge, behind Mirage OS, a general platform technology intended to change – and make vastly more efficient – the way cloud computing works. For both users and providers of “big iron” services such as Amazon and Rackspace, Mirage OS might be interesting for its increased efficiency and lowered costs due to its ability to provide streamlined workflow that allows largely automated cloud software deployments. For everyone else, Mortier hopes Mirage OS will allow us to manage our online lives more effectively and safely.

Currently, everyone synchronises data across multiple devices via services that belong to large companies, partly because it’s convenient, and partly because there are few other easy choices. But it wasn’t always this way. In the early 1990s, for example, synchronising data across multiple devices was done directly across internal networks, a system that was more resistant to pervasive monitoring. So why shouldn’t we transfer this to the cloud and be able to run our own email servers and own our own data?

“From a technology point of view,” says Mortier, “I believe that for people to manage their online lives effectively they will need strong identity online and to be able to control how their devices connect to each other and to other people’s. We need a structure that allows people to meaningfully take part rather than just giving everything up to Apple/Google/Facebook. We need to look at ways to allow individuals to run their own infrastructure.”

But, as Mortier is the first to admit, for people to manage their own infrastructure they need tools they can use and understand. A number of such tools are in progress to run on top of Mirage OS, all to be found at the Nymote Web site http://nymote.org. Among these tools,http://nymote.org/software/signpost Signpost seeks to make it possible for people to connect all their own devices to each other in a private cloud by providing a manageable naming service. http://nymote.org/software/irmin Irmin is a library database that uses the principles behind version management software tools (like Git) and applies them to storing and syncing data, making it possible not only to push changes from one location to another but also to revert to previous versions when necessary.

The short explanation: Mirage OS, a Collaborative Project incubated by thehttp://xenproject.org Xen Project, is a way to restructure modern virtual machines, such as those running servers in the cloud, into flexible, secure, and reusable modular components. Using this system should enable each physical machine to host many more virtual machines, lowering energy and hardware costs with obvious savings for both providers and users.

To understand what that means, it’s necessary to understand something about current infrastructure designs. These begin with servers, the computers that store data that other, “client”, computers access. When you visit a Web page, what’s really happening is that your browser (the client) asks the remote Web server to send you a bunch of coded data which it then displays to you as a page. Behind the scenes, the Web server may assemble the page from many sources and process it in many ways, sending it to your browser bit by bit. Similarly, you may query a database server (a search engine, or a business database of customers or products) to get back a particular set of results. Flickr is a pile of servers that run software to let you store, display, organise, and edit photographs; Facebook is a (much bigger) pile of servers that run applications so you can display and manipulate data of all kinds – including supporting the apps that run in your browser and enable user-to-user interaction.

Current deployment practice with Mirage OS unikernels uses cloud services like the code repository http://github.com Github and the continuous integration service http://travis-ci.org Travis CI so that committing changes to software source code (via Github) kicks off automated testing and compilation (by Travis CI). Updates to the Mirage OS Web site, for example, become a “pull request”, which notifies Travis CI to rebuild the site. If the rebuild is successful, Travis CI commits the results of the rebuild to a deployment repository. From there, the site administrators check it and make it live.

Before today’s cloud infrastructure, servers were commonly individual computers running on internal business networks. Even now, a business may have one or more computers sitting in a corner that no one uses directly but that store files or manage email and that everyone accesses from their own machines throughout the business. Increasingly, however, companies began to balk at the expense of buying, maintaining, and managing those servers. Instead, they began renting space at off-site server farms, where they’d have increased bandwidth, better physical security, expert management, and simpler backups. To the server farms, another inefficiency soon became obvious: the energy and hardware overheads of running a separate machine for every business or application, each one idle most of the time.

Enter the virtual machine, which is an emulation of an entire computing environment running in a portion of a physical machine instead of one of its own via a layer of software called a virtual machine monitor, or hypervisor. Under this structure, a single physical machine can host myriad virtual machines, each one sandboxed from and independent of the others. Amazon’s EC2 Web services, therefore, is a giant pile of machines, each of which is parcelled out into pieces to rent to Amazon’s millions of customers, who in turn use them to run their own applications, whatever those might be – a web or email server, a database, or even a commercial service. Netflix, for example, built its business on EC2, in part because doing so allowed it to scale up very quickly if it brought a new region online or had a sudden influx of new subscribers. For cloud computing customers, the chief benefits are lowered costs, due to economy of scale, easier management, and the ability to scale up or down very quickly when needed.

The big downside with this system is its complexity: each virtual machine is effectively pretending to be a full a Windows or Linux box, supporting all sorts of legacy drivers and devices. When servers were physical, rather than virtual, machines, this made sense because they were expensive enough that they typically ran multiple, carefully segregated, functions, simultaneously acting perhaps as a mailserver, fileserver, printserver, and database server. The earliest virtual machines copied this design. Now, however, computing power and storage space have become so cheap that it’s more common for each virtual machine to manage just one function. Which leads to the question: why does a mailserver need an operating system that’s designed to support standard office tasks such as printing? The result of this approach is an increasingly complex stack of software layers, many of them unnecessary and none designed with modern hypervisors in mind, with the costs measured not only in energy and resource use but also in time spent on debugging, maintaining, securing, and auditing.

Mirage OS replaces this structure with single-purpose, all-in-one virtual machines calledhttp://queue.acm.org/detail.cfm?id=2566628 unikernels that are optimised for the cloud and that include only the necessary components. The results can be seen on Mirage OS’s ownhttp://openmirage.org Web site, which is run in exactly this way.

At the same time, Mortier believes Mirage OS can be used to build infrastructure that the general public can run. “There are a bunch of benefits you get from that – much better manageability, much smoother deployment. Because of the technology we’re building it on, I believe it’s also more secure.”

Mortier’s reasoning is that removing the unnecessary parts of the software stack makes them faster and also shrinks opportunities for attackers. In addition, the hope is that a clean implementation will allow the team to create APIs that make it inherently harder to make common security errors such as buffer overflows. That begins with the language Mirage OS is written in – http://ocaml.org OCaml which, Mortier says, is “more rigorous about telling you about mistakes” than C or C++, the languages that Windows, Linux, and many other important pieces of infrastructure software are written in.

Mortier says, “The more pieces you take out, the better defined you can make those pieces, which is the thing that often goes wrong in software – you get too much bleedover between different components.” OCaml helps with this issue, too, he says, by providing strong definitions for the interfaces between components, which can be used by Mirage OS to make it easy to swap implementations out without worrying about disrupting dependencies.

This helps when looking at some of the more complex technological issues. For example: what happens if you’re running your own infrastructure – maybe your Web server is a Raspberry Pi intended to handle 20 requests a day – and your Web page gets Slashdotted and visitors flood your server? One of the ideas Mortier’s group is exploring is how a site can self-scale: “It should be able to detect that its load is increasing and should be able to automatically respawn itself on Amazon’s EC2 as however many virtual machines it needs to handle the load and then when the load diminishes it can scale itself back in again and go back to running on your Raspberry Pi.” There are obvious practical issues to solve there – you’d have to have an account set up and some way of limiting how much you spend – perhaps financed as a form of insurance – but the idea is intriguing. The most recent patches to the Mirage OS code from the EU FP7 User Centric Networking project are beginning to fulfil one of the next technical goals by adding support for ARM-based platforms like the Raspberry Pi – all part of the general mission enable people to run their own personal clouds.

The biggest difficulty, however, lies in trying to reduce complexity and remove the need for expertise. “One of the things I would like to see Mirage OS be used to make it simpler to run your own infrastructure,” he says. But, he says, “Simplicity is hard.”

News

Infrastructure for the Masses