A bit of a RoCE start…

While presenting about S2D (Storage Spaces Direct) during MVPDays, I was asked if the benefits of RDMA over Converged Ethernet was worth replacing existing 10GbE infrastructure for a cluster.

To answer this, we first need to understand how a shared-nothing Storage Pool works.

Consider a traditional converged datacenter, with storage separated from compute and network. The SAN is responsible for disk/controller availability, providing shared volumes over fiber or iSCSI network connections identically to all hosts in the cluster. Simultaneous disk reads and writes occur within the SANs backplane itself and do not traverse the network to cluster nodes.

In a Storage Spaces Direct cluster, disks are installed directly on the compute nodes, which is commonly known as Locally Attached Storage. We then add all the disks from every nodes into a single storage pool. From this pool, we provision Cluster Shared Volumes, specifying the level of redundancy required for that specific volume. The level of redundancy will determine the amount of drives in each host that are used to share the load, and ultimately the amount of raw storage that that CSV consumes.

For example, if I have a 2-node cluster that are each populated with 4x2TB SSD drives, my storage pool will have a total capacity of 16TB RAW. In this pool, I create a 1TB volume with a redundancy level of 1. This creates a two-way mirror, allowing me to lose one disk in the pool. To accomplish this, S2D places the volume across multiple physical disks, and hosts, in the pool, occupying 2TB of space. If we are using a 3-node cluster, 3-way mirrors (disk redundancy 2) are recommended. In a 3-way mirror, if you haven’t put it together yet, the 1TB CSV will occupy 3TB of raw storage across the three nodes.

Now this is where the networking component comes in to play. As the volume is not local to one node, the process creates significant network traffic between the two. We segment this traffic on the nodes by using specific virtual adapters for storage and cluster data, and implement QoS to ensure the storage connection has the highest priority. Using multiple virtual adapters on a teamed vSwitch is also known as Converged Ethernet. As you can imagine, this causes significant network traffic, and is the reason why 10GbE is recommended for HyperConverged clusters.

With RDMA, or Remote Direct Memory Access, it enables the network adapters to transfer data directly from/to main memory on separate hosts, eliminating the need for the OS to process that data first. This requires no CPU resources and does not require a cache, thus providing high-throughput/low-latency connection between hosts. This is also known as Zero-Copy networking.

I heard a great analogy last week from MVP Dave Kawula: Think about Star Trek. The shuttlecraft is your 10GbE network. It’s faster than pre-warp spacecraft, but still has to undergo loading and docking procedures before moving the subject. The transporter on the other hand, your RDMA network, analyzes the subject’s DNA and sends it as code to be recompiled on the other end, making the transfer much faster.

Thankfully, RDMA-capable Mellanox cards are not expensive, running around $300 each. A 10Gb RDMA switch can also be purchased for less than 5k. After spending the cost of a new SUV on your 10Gb infrastructure (which still wasn’t wasted effort), the small added cost of RDMA is worthwhile…not just for the S2D performance benefit, but by also freeing up your 10Gb for other tasks.

Hope this helps!

Featured

Notes from the Field: Migrating Azure AD Connect to Microsoft Entra ID Connect (Before the April 30, 2025 Deadline)

Featured

Containing Rogue Devices on the Network: Microsoft Defender for Endpoint’s New IP Containment and How It Stacks Up

Featured

Hardening IPMI Interfaces on Intel Servers with RADIUS & Duo MFA

Featured

🛠️ KB Report Summary – April 8, 2025 (I know, a little late :))

Featured

🚨 April Patch Tuesday Breakdown: Elevation of Privilege, RCE, and More

Featured

Embracing the Next Chapter: Leveraging My Three Decades of IT Experience to Drive Organizational Transformation and Nurture Future IT Leaders

A bit of a RoCE start…

Related

About The Author

Émile Cabot

Leave a ReplyCancel reply

Translate our Blog

Subscribe to our videos

Subscribe to our Blog

Our Authors

Cary Sun

Cristal Kawula

Dave Kawula

Émile Cabot

John O'Neill Sr.

Kawula Dave

Kevin Kaminski

Rick Vanover

Steve Labeau

Follow Us

Facebook

Youtube

Twitter

Instagram

Category

Blog Stats

Featured

Notes from the Field: Migrating Azure AD Connect to Microsoft Entra ID Connect (Before the April 30, 2025 Deadline)

Featured

Containing Rogue Devices on the Network: Microsoft Defender for Endpoint’s New IP Containment and How It Stacks Up

Featured

Hardening IPMI Interfaces on Intel Servers with RADIUS & Duo MFA

Featured

🛠️ KB Report Summary – April 8, 2025 (I know, a little late :))

Featured

🚨 April Patch Tuesday Breakdown: Elevation of Privilege, RCE, and More

Featured

Embracing the Next Chapter: Leveraging My Three Decades of IT Experience to Drive Organizational Transformation and Nurture Future IT Leaders

A bit of a RoCE start…

Share this:

Related

About The Author

Émile Cabot

Related Posts

Deploying Storage Spaces Direct – Part 16 #StorageSpacesDirect #mvphour

Deploying Storage Spaces Direct – Part 32 – @MellanoxTech Hidden Device Manager Settings #StorageSpacesDirect #MVPHour #HyperV

Deploying Storage Spaces Direct – Part 3 #StorageSpacesDirect #mvphour

UPDATE MELLANOX NIC FIRMWARE AND DRIVER FOR NON-RDMA STORAGE SPACE DIRECT CLUSTER SERVERS #STORAGEPACEDIRECT #POWERSHELL #WINDOWSSERVER #MVPHOUR #STEP-BY-STEP #MELLANOX

Leave a ReplyCancel reply

Translate our Blog

Subscribe to our videos

Subscribe to our Blog

Our Authors

Cary Sun

Cristal Kawula

Dave Kawula

Émile Cabot

John O'Neill Sr.

Kawula Dave

Kevin Kaminski

Rick Vanover

Steve Labeau

Follow Us

Facebook

Youtube

Twitter

Instagram

Category

Tags

Blog Stats