Sunday, September 29, 2013

Why Facebook is betting its future on open hardware, and why it matters to you


How Facebook has significantly cut the cost of delivering computing capability. Aivars Lode Avantce

Why Facebook is betting its future on open hardware, and why it matters to you

By Nick Heath | September 24, 2013 -- 13:17 GMT (06:17 PDT)
For years, servers have been built to a set menu — with most customers offered boxes including the same hardware and software, whether they needed it or not.
Web giants such as Google and Facebook have chosen more of an a la carte aproach for their datacentres, designing servers tailored to the computing workloads they will be carrying out and nothing more.
Under the Open Compute Project (OCP), Facebook and its partners have committed to developing novel designs for compute, storage and general datacentre infrastructure — not just the servers themselves, but the chassis and racks they sit in and their associated power and cooling, and then to sharing those designs so they can be refined and built upon.
For Facebook, the shift to DIY datacentres is paying off, in both the efficiency and the reliability of its datacentres. Facebook's datacentre in Lulea, Sweden, is kitted out with 100 percent OCP-designed equipment, and has a failure rate one third that of Facebook's datacentres running a mix of OCP and non-OCP hardware.
Similarly, Facebook credits OCP equipment for allowing it to run some of the most efficient datacentres in the world. Facebook's server farms have a power usage effectiveness (PUE) rating of 1.07, when the "gold standard" for the industry is 1.5 (PUE reflects how much of the total power delivered to a datacentre gets to a server).

Ultimately, companies will be better served by kitting out their datacentres with infrastructure that has been stripped down to the core components needed to carry out specific computing workloads, said Frank Frankovsky, Facebook's VP of hardware design and supply chain operations, an approach that the OCP refers to as 'vanity-free design'.
"We [Facebook] remove anything that's not necessary, if it doesn't contribute to computing your News Feed or storing your photos then it's not in the design so there's fewer things that can break," he said, explaining why OCP servers have proven more reliable.
"You strip out as much as you can to make it as efficient as possible, and also there are not as many ancillary things that can break."
Frankovsky said another boost to quality comes from testing that is done being specific to the workloads that are run. In contrast, the testing procedures used on traditional servers from OEMs are limited because they have to do "very wide and shallow testing" for a large number of use cases.
Modularity is another core OCP design principle, a central idea behind the project is to break components of the datacentre, rack and server into parts that can be swapped out as computing needs change. At the project's Santa Clara summit earlier this year, the 'Group Hug' slot architecture was revealed, whose design would allow server motherboards to accept ARM system on a chips, as well as AMD or Intel chips.
While only a handful of large companies have publicly acknowledged using Open Compute designs in their datacentres, the other big firm being the European hosting company Rackspace, it seems server OEMs such as IBM, HP and Dell will have to also move into a la carte servers if they want to win business from large web customers such as Facebook.
All Facebook hardware in datacentres will be OCP-designs, said Frankovsky, whether that's provided by ODMs (Original Design Manufacturers) — companies such as Hyve Solutions or Quanta that manufacture products for sale under another brand — or by the OEMs, as both Dell and HP are members of the OCP.
The openness of the OCP means that contributors share the designs for the datacentre equipment they produce. Large companies like Rackspace can then take these designs and modify them to suit their specific needs and then get ODMs to manufacture this equipment.
Smaller companies, without the ability or desire to modify OCP designs, can buy equipment based on vanilla OCP specifications. This is the approach followed by video games publisher Riot Games, responsible for the popular online game League of Legends.
There are a number of OCP-approved equipment makers that manufacture servers and datacentre infrastructure based on OCP designs, such as Penguin Computing, which makes servers based on the Open Compute V2 and V3 systems.
"Those are two examples of how the community are engaging from an adoption perspective," Frankovsky, who is also chairman of the OCP, said.
Why open compute will become mainstream
Even if firms don't adopt OCP equipment directly Frankovsky predicts the number of computing workloads carried out on OCP equipment will continue to grow as more work shifts to cloud computing providers from smaller datacentres.
"Cisco UCS or PowerEdge systems are excellent for SMEs who want a fully integrated stack of technology. Those customers don't want transparency, they want stuff to work," Frankovsky said.
"Over time those customers will procure less and less but will outask to cloud computing providers, which is why I'm so bullish about the impact that OCP will have on industry over the next three to five years. It's those large providers who are the main adopters of open compute."
Facebook's Frank Frankovsky. Photo: Jack Clark
Not all businesses have easily definable computing workloads that can be so easily matched to the underlying hardware, or are bound by regulatory or other constraints on how they carry out computing, which Frankovsky accepts could limit their adoption of OCP hardware.
Learning how OCP could work for these companies was part of the reason that OCP asked Don Duet, head of global technology for financial firm Goldman Sachs to sit on the OCP board, he said.
"Goldman Sachs is an example of a company that is heavily burdened by regulatory requirements. Learning more about what are the challenges of achieving this level of efficiency based on the reality of the world they live is teaching us a lot," Frankovsky said. 
"One of the things we've come up with though is not everything is bound by those restrictions. Financial services have very large scale-out farms of computers that look a lot like ours, typically running Monte Carlo simulations. That's a pretty large part of their IT infrastructure that can immediately benefit from Open Compute.
"There are large and growing portions of everyone's computing infrastructure that look more like web scale architectures," he said, citing health and pharmaceutical research compute clusters and oil and gas server farms for processing seismic data.
What's Hot on ZDNet
"From a physical infrastructure perspective, they look a heck of a lot like a Google, Microsoft, Facebook or an Amazon."
One emerging server architecture that complements OCP's vanity free and modular approach is microservers, energy sipping servers — typically with a TDP below 15W — that can be used to carry out computationally simple tasks at scale.
"We're really excited about the potential for microservers [at Facebook]," said Frankovsky, adding that Facebook is yet to adopt them in any number because it is waiting for a suitable 64-bit microserver system on a chip. Intel recently released its second generation of Atom-based microserver SoCs — the Avoton chip — and the first 64-bit ARM microservers look unlikely to ship in any number until 2014.
"The first areas that we would utilise a technology like that is in the storage area," Frankovsky said.
He said Facebook are currently testing two OCP Open Vault storage arrays — one Avoton-based and the other a Calxeda ARM-based system — where the Serial attached SCSI (SAS) controllers have been swapped for microservers.
"We've turned what's considered just a bunch of discs into a storage server. That enables you to eliminate the entire separate server chassis that used to sit above the disc enclosure, now the server is a microserver and sits in a card slot that is part of the storage device," he said.
The design for the storage array came from the OCP community building on the freely available designs for OCP hardware, and for Frankovsky it demonstrates why sharing OCP designs is worthwhile.
"That's an example of a community contribution, and one of those 'Aha' moments, it's like 'Gosh, why didn't we think about that'. We have a lot of great engineers but we don't have all the greatest engineers under one roof, nobody does. That's an example of the power of open source," Frankovsky said.

No comments:

Post a Comment