Over time, the tech industry has developed and deployed variations on the fat-tree architecture. But the design has room for improvement. It’s generally reliable, but also rigid, inefficient, and requires complex cabling. As in, actual physical cables.

If you’ve ever been in a data center or an office building’s server room, you’ve likely seen nests of colorful cables spilling out of metal racks. Cabling is one of the greatest costs in networking, Rehder says, and Amazon’s global data centers are currently connected with 20 million kilometers of fiber optic cables. That’s roughly the distance it would take to travel from Earth to the moon and back 25 times.

In 2012, as the demand for cloud computing services was exploding, a group of researchers at University of Illinois Urbana-Champaign, including Godfrey, introduced a concept known as Jellyfish. Fixed network designs in use at the time were struggling to meet growing demand, so the researchers proposed a “high-capacity network interconnect which, by adopting a random graph topology, yields itself naturally to incremental expansion.” They believed this random approach could be more efficient and scalable than networks built using the fat-tree architecture.

“We gave it the name Jellyfish because it’s fluid,” Godfrey says. “You can connect the routers and switches randomly and it becomes this flexible pool of network capacity, which is very efficient.”

However, Jellyfish also introduced new challenges in layout, data routing, and cabling. Routing in random graphs is trickier, Godfrey says, because there are many more and diversified paths that data can take from its source to its destination. Cabling is harder because the endpoints of the cables are chosen randomly.

A couple of years later, Google began toying with another solution: It started integrating optical circuit switching, or OCS, into its network designs. This approach uses tiny mirrors to reflect light from an input port to an output port, which lets Google refigure optical cabling in real-time. But, again: This adds a certain amount of engineering complexity, as well as cost.

Courtesy of Amazon

Courtesy of Amazon

So Random

Amazon, meanwhile, was searching for the “holy grail,” says Giacomo Bernardi, who is one of the lead authors on the new paper, along with Amazon Scholars Ratul Mahajan and Seshadhri Comandur. In an ideal world, a data network would be flat and efficient, resilient to hardware failures, random enough to maximize performance, and scalable enough to grow without becoming unwieldy. It would also rely on simpler, streamlined cabling rather than increasingly complex fiber-optic systems.

When he and his colleagues began trying to build such a network, Bernardi says he had already become obsessed with Penrose tiling, a kind of aperiodic tiling named after the British physicist Roger Penrose. (Other researchers have been so inspired by Penrose tilings that they’ve tried to translate the patterns into error-correcting code in quantum computers.) Bernardi wondered if Amazon could use a similar construction and create a flat “mesh” by following a repeating pattern. He and his team tried building a simulation of what that might look like.

Share.
Exit mobile version