The Servers that Power IBM Spectrum Fusion HCI

In my previous post, I gave an introduction to the hardware of the IBM Spectrum Fusion HCI product and described how Spectrum Fusion HCI was designed to be simple to deploy and operate, resilient to failure, and provide a high level of performance.  You can find that introductory post here.  In this post, I take a closer look at the server components of Spectrum Fusion HCI.

There are four different server types that can be included in Spectrum Fusion HCI, and I'll discuss each of them in turn:

  1. Storage/compute servers, the basic building block
  2. Compute-only servers, for boosting compute power
  3. GPU servers, for AI applications
  4. AFM servers, for connecting to a larger storage network

The Basic Building Block

The "HCI" in the product name stands for Hyper-Converged Infrastructure and describes a style of system design in which pieces of the design that were typically separated are brought together to reduce the types of components.  There is no separate storage component in Spectrum Fusion HCI because storage components and processor components are combined into servers that provide storage and processing which are called storage/compute servers.

The storage/compute servers are the basic building block of Spectrum Fusion HCI and they are the only servers that must be included -- all other server types are optional.  Each storage/compute server has a minimum of two (and a maximum of ten) 7.68TB NVMe storage drives.  Every system must have at least six storage/compute servers and these are configured to form the storage cluster of the overall system.  There can be up to 20 storage/compute servers in a single system.  If each of these 20 servers has the maximum number of NVMe drives, the result is a system with a storage cluster with approximately 1 petabyte of usable storage for the applications running in its OpenShift cluster!

The storage/compute servers are two-socket servers with a 16-core AMD processor in each socket to give a total of 32 CPU cores.  The base RAM amount for these servers is 256GB (8GB/core), but they can also be ordered with 512GB of RAM (16GB/core).  For those who have especially memory-hungry applications, it is possible to upgrade these servers to 1024GB of RAM (32GB/core!).

The Computing Power Boost

In cases where more computing power is needed in the Spectrum Fusion HCI system but there is already sufficient storage, compute-only servers can be added.  A compute-only server has the same configuration as the storage/compute server except that it has none of the 7.68TB NVMe drives to contribute to the storage cluster of the system.  A compute-only server can also have its RAM upgraded from the base amount of 256GB to either 512GB or 1024GB just like the storage/compute server.

The AI Specialist

For some applications, especially AI applications, the specialized power of GPUs is required.  It is for these applications Spectrum Fusion HCI provides the option of adding a pair of servers that each have three NVIDIA A100 40GB GPU PCIe adapter cards -- a total of six of these very powerful GPUs in the server pair.  These GPUs can be used by the applications running in the OpenShift cluster to accelerate AI model training and inferencing operations.

The GPU servers are two-socket servers, but each socket has a 24-core AMD processor to give a total of 48 CPU cores in each server.  There is 512GB of RAM in each of the servers as well as a pair of 3.2TB SSDs that are available to applications running on the GPU servers for local storage.  These 3.2TB SSDs are not configured to be part of the systems storage cluster.

Creating Storage Connections

The software that is used to create the storage cluster from the NVMe drives of the storage/compute servers has a feature called Active File Management (AFM).  The AFM feature makes it possible to share data between storage clusters, even if the networks that connect them are unreliable or aren't particularly fast. This allows a Spectrum Fusion HCI system to connect and become part of a larger storage network, giving the applications running in the OpenShift cluster access to data beyond what is available within Spectrum Fusion HCI.

Like the storage/compute and compute-only servers, the AFM servers are two-socket servers that have two 16-core Intel processors. They have a bit less RAM, with a total of 192GB of RAM, but this is sufficient for the AFM service. Unlike all the other server types already described, the AFM nodes have a reserved location within the system and are not counted towards the maximum of 20 servers in a single system.

This concludes the survey of the different server types that can be put together to create a Spectrum Fusion HCI system.  In my next post, Connecting It Together in IBM Spectrum Fusion HCI, I describe the switches that connect all these servers together and that connect the system to your data center network.


Server stack photo by Paul Kauffmann
Circuit board photo by Alexandre Debiève on Unsplash

The opinions expressed in this post are those of the author.

Comments

Popular posts from this blog

Inside the Storage/Compute Servers of IBM Spectrum Fusion HCI

Fusion HCI Adds 8x the GPU Power

Fusion HCI Performance Boost for AI Apps