Inside the Storage/Compute Servers of IBM Spectrum Fusion HCI

Previously, I provided an overview of the IBM Spectrum Fusion HCI hardware followed by posts giving an overview of the different server and switch components that go into that system. Most of the servers in an IBM Spectrum Fusion HCI system will be storage/compute servers, so you can think of these servers as the workhorses of the system. In this post, I take a closer look at the CPU, memory, storage, and network elements inside these storage/compute servers.

CPU and Memory

A storage/compute server is built from a Lenovo SR645 server that is one rack unit (1U) high.  (A rack unit is an industry standard that translates to 1.75 inches; an industry standard data center rack used to hold servers and switches is 42U tall.)  Inside the server are two processor sockets that are each populated with an AMD EPYC 7302 processor.  The EPYC 7302 is a 16-core, 32-thread, processor that operates with a base clock speed of 3.0 GHz.  With two of these processors, the server has a total of 32 cores and 64 threads available for running applications on Red Hat OpenShift.

The AMD 7302 processor is part of the AMD EYPC 2 processor family, sometimes referred to as the "Rome" family.  There are many features of EPYC 2, but one of the most important is its support of PCIe Gen4.  At a high level, PCIe is a technology that is used to connect together the processor, storage drives, network interface cards, and other peripheral devices within a server.  Version 4 of PCIe is twice as fast as PCIe 3 and so it represents a major improvement in system performance.

Each AMD 7302 processor has eight memory channels for a total of 16 memory channels.  The SR645 system board has slots for placing two DIMMs on each memory channel meaning that the server can have a total of 32 DIMMs installed.  For best performance, AMD recommends that all memory channels be populated so that each channel has equal capacity.  For that reason, the base memory configuration of a storage/compute server uses 16 DIMMs to populate each memory channel with one DIMM and leaves the other 16 memory slots on the system board empty.

Each of the 16 DIMMs has a capacity of 16GB for a total base RAM amount of 256GB.  The 16GB DIMM capacity was chosen to provide 8GB of RAM per processor core.  If more RAM is required, the storage/compute servers can be upgraded to double the amount of RAM.  Doubling the amount of RAM is done by populating each of the remaining empty memory slots with a 16GB DIMM.  The result is that all memory channels have two 16GB DIMMs and the total RAM amount of the system is 512GB (16GB per core).

Storage

Every storage/compute server has a minimum of two, and a maximum of ten, 7.68TB NVMe PCIe 4 SSDs.  These NVMe drives are U.2 form factor drives that plug into the slots on the front of the storage/compute server.  These drives are hot-swappable, meaning that the drives can be added to and removed from the drive bays without having to first power down the server.   Because these drives use the NVMe interface with PCIe 4, they are very high-performance drives delivering up to 7GB/s Sequential and 1.5M IOPS Random Read speeds.  All of the NVMe drives in all the storage/compute servers of a Spectrum Fusion HCI system are managed by the system's storage software and combined to form one logical storage cluster.

All of the storage/compute servers also have a pair of 960GB M.2 SATA SSDs onto which the Red Hat OpenShift CoreOS operating system is installed.  The two M.2 form factor drives are connected to a RAID controller and configured for RAID 1 (mirroring) to provide redundancy.  Should one of these M.2 operating system drives fail, the system can continue to function.

Network

Each storage/compute server has two NVIDIA/Mellanox dual-port network interface cards (NICs).  A dual-port ConnectX-6 100GbE NIC is used to provide high-performance connectivity to support the system's storage network, as required by the distributed nature of the storage cluster within Spectrum Fusion HCI.  The ConnectX-6 NIC supports PCIe 4 and exploits the speed of PCIe 4 for quickly moving storage data to and from the NIC.

Connections to the system's application network are provided by a dual-port ConnectX-4 25GbE NIC.  Having physically separate NICs for the storage and application networks provides isolation and reduces the possibility of application traffic interfering with traffic on the storage network.  Each server is connected to the management network using both a built-in Ethernet port on the system board and an Intel I350 1GbE RJ45 4-port OCP Ethernet adapter.

As you can tell from the above description, the compute/storage server of Spectrum Fusion HCI has been designed to provide both high-performance computing power for applications running in Red Hat OpenShift and a high-performance storage cluster to support those applications.  In my next post, I will take close look at the GPU server that can be added to Spectrum Fusion HCI to give a boost to AI and machine learning workloads.

Previous post: The Servers that Power IBM Spectrum Fusion HCI

Next post: Support for AI and ML Apps in Spectrum Fusion HCI


Photos by Paul Kauffmann

The opinions expressed in this post are those of the author.

Comments

Popular posts from this blog

Fusion HCI Adds 8x the GPU Power

Fusion HCI Performance Boost for AI Apps