Optical circuit switching in disaggregated cloud and HPC infrastructures
In this blog summary of his white paper, Dr. Michael Enrico, Network Solutions
Architect at HUBER+SUHNER Polatis, explains how network disaggregation,
supported by an optically switched interconnect fabric, can make a critical
contribution to new network designs being developed to support AI and Machine
Learning.
The Hyperscalers in cloud computing and other providers of high-performance
computing (HPC) services have to architect and scale their computing platforms to
meet client demand for AI applications while controlling CAPEX and reducing power
requirements. In particular, the processing power required has increased by orders
of magnitude.
Instead of the building blocks of these platforms being tightly and inflexibly bundled
in a fairly monolithic platform such as a standard server chassis, the process of
"disaggregating" the requisite component parts or sub-systems avoids the risk of
increased inefficiency and underutilisation of some of the key underlying resources,
and importantly excessive power consumption, inevitable if simply "racking and
stacking" more servers.
In a disaggregated architecture, these resources (CPU, memory, storage,
acceleration hardware in its various guises) are flexibly combined by interconnecting
them using integrated high-speed digital transceivers and a dedicated interconnect
fabric based on appropriate transport media and switching technologies. These can
then be combined and appropriately scaled, independently of each other, to meet the
demands of the expected workloads.
The principle of disaggregation is shown in the diagram above.
The required resources are bundled together in bespoke ratios to form flexibly
proportioned "bare metal" hardware hosts "composed" on-the-fly, using a common
pool of underlying finely grained resources. The key building blocks in this case are
the lower level resource elements themselves such as CPUs, memory, storage and
various kinds of accelerators (GPUs, TPUs, FPGAs).
Several levels of disaggregation can be defined, related to the granularity at which
the resource blocks can be accessed and consumed.
In the most granular form of disaggregation, each resource block (e.g. a bank of
DRAM, a CPU, an accelerator) has onboard hardware to facilitate the necessary
high-speed, low-latency connection of its resources to an interconnect platform.
Less granular forms of resource disaggregation that are more compatible with
current hardware implementations may be seen as a way to facilitate a more gradual
transition towards fully disaggregated platforms.
These include:
An application in which the dynamically interconnected compute resource
components are limited only to accelerator hardware. By fitting single mode optical
transceivers they can be flexibly and directly interconnected with those in other hosts
using a dedicated optical switching fabric that effectively acts as an overlay to a
packet switching fabric that is already used to provide most of the interconnect
between the hosts in the cluster.
Going beyond the interconnection of accelerator cards alone to accessing more of
the resources already present in fleets of conventional servers, a dedicated PCIe
interconnection card fitted with specialised SerDes processing hardware and firmware and high-density, high-speed optical transceivers acts as a high
performance gateway between the PCIe-connected compute resources in that
chassis and the optical interconnect fabric.
The Interconnect Fabric
An optical interconnect fabric with transparent optical circuit switching provides
deterministic, circuit-switched, fixed bandwidth data paths which are well suited to
interconnect hardware resource elements that would otherwise be directly and
deterministically interconnected at a low level by dedicated traces on a server
motherboard or via a specific bus technology such as PCI Express.
It also promises significant reductions in power consumption of the fabric itself
compared to an electrical fabric, much lower latencies associated with the data paths
through it, and a better ability to physically scale the fabric up and out. It also enjoys
significantly better future-proofing thanks to the inherent transparency of the fabric to
the formats and line rates of the serialized data traffic between the optical
transceivers associated with the disaggregated resource elements.
The lowest loss optical circuit switches, such as POLATIS DirectLight™ switches,
allow for fabrics to be constructed with up to four or more stages of switching whilst
keeping within the optical loss budgets of typical transceivers used with
disaggregated resource elements.
The Benefits of Disaggregated Computing
- Hardware computing platforms can be composed on-the-fly.
- Platforms can be scaled to whatever size and ratio of the available resource
types is appropriate to the kinds of workloads that will be run on the hardware.
- Platforms can be resized during the course of running a particular workload as
the resource consumption requirements evolve.
- Resources not required can be temporarily powered down, resulting in OPEX savings.
Operators can:
- Select best-of-breed vendors for the various component building blocks.
- Use those resources that support only the specific functions they need.
- Upgrade different types and/or blocks of resource element as and when
required.
For a more detailed discussion of the topic and to discover the benefits of disaggregated computing for operators, please read Michael’s latest white paper here:
https://www.polatis.com/White_Paper_POLATIS_Disaggregation_EN.pdf