Apple’s Thunderbolt 5 and RDMA Boost Mac Clusters for AI Research in macOS Tahoe 26.2

Revolutionizing AI Research: Thunderbolt 5 and RDMA Propel Mac Clusters to New Heights

In the rapidly evolving field of artificial intelligence (AI), the demand for more efficient and powerful computing resources is ever-increasing. Apple’s latest advancements in Mac cluster computing, particularly through the integration of Thunderbolt 5 and Remote Direct Memory Access (RDMA), are set to significantly enhance the capabilities of AI researchers working with large-scale models.

Unveiling Thunderbolt 5 and RDMA in macOS Tahoe 26.2

In November 2025, Apple announced forthcoming features in macOS Tahoe 26.2 that promise to transform machine learning workflows. A key highlight was the enhancement of MLX, Apple’s machine learning framework, to support GPU-based neural accelerators. Equally noteworthy was the introduction of Thunderbolt 5 clustering support, a development poised to revolutionize inter-device connectivity and memory sharing among Macs.

Real-World Application: Mac Studio Cluster Testing

On December 18, 2025, tech enthusiast Jeff Geerling shared his hands-on experience with a cluster of four Mac Studio units, generously provided by Apple. This setup, valued at nearly $40,000, served as a testbed to demonstrate the practical benefits of Thunderbolt 5 in cluster computing environments.

Each Mac Studio was equipped with the M3 Ultra chip, featuring a 32-core CPU, 80-core GPU, and a 32-core Neural Engine. Two units boasted 512GB of unified memory and 8TB of storage, while the other two had 256GB of memory and 4TB of storage. Housed in a compact 10-inch rack, the cluster operated quietly and efficiently, with each unit consuming under 250 watts.

The Power of Massive Memory Pooling

The integration of Thunderbolt 5 in macOS Tahoe 26.2 introduced a new driver supporting RDMA, a critical feature for AI research. Traditional Ethernet-based clustering is often limited to 10Gb/s, depending on the Mac’s specifications. In contrast, Thunderbolt 5 offers a substantial bandwidth increase, reaching up to 80Gb/s.

RDMA allows one CPU node in the cluster to directly access the memory of another, effectively pooling the memory resources across all connected Macs. This direct access minimizes processing overhead and expands the available memory pool, enabling the handling of large language models (LLMs) that would otherwise exceed the capacity of a single Mac. In Geerling’s setup, the combined memory totaled an impressive 1.5 terabytes.

Benchmarking Performance Gains

Geerling conducted a series of benchmarks to evaluate the performance improvements facilitated by Thunderbolt 5 and RDMA. After enabling RDMA in recovery mode, he utilized open-source tools such as Exo and Llama.cpp to run models across the cluster.

Initial tests with the Qwen3 235B model revealed promising results. On a single node, Llama achieved 20.4 tokens per second, slightly outperforming Exo’s 19.5 tokens per second. However, as additional nodes were incorporated, Exo’s performance scaled significantly:

– Two Nodes: Exo reached 26.2 tokens per second, while Llama decreased to 17.2 tokens per second.

– Four Nodes: Exo achieved 31.9 tokens per second, whereas Llama further declined to 15.2 tokens per second.

Similar trends were observed with the DeepSeek V3.1 671B model:

– Single Node: Exo processed 21.1 tokens per second.

– Two Nodes: Performance increased to 27.8 tokens per second.

– Four Nodes: Exo reached 32.5 tokens per second.

These results underscore the substantial performance gains achievable through RDMA-enabled clustering, particularly as the number of nodes increases.

Addressing Large-Scale Model Challenges

Geerling also tested a one-trillion-parameter model, Kimi K2 Thinking 1T A32B, activating 32 billion parameters at a time—a scale unmanageable by a single Mac Studio with 512GB of memory. Over two nodes, Llama achieved 18.5 tokens per second, while Exo, leveraging RDMA, reached 21.6 tokens per second. With four nodes, Exo’s performance further improved to 28.3 tokens per second.

These findings highlight the potential of Thunderbolt 5 and RDMA to facilitate the processing of exceptionally large models, a critical capability for advancing AI research.

Considerations and Future Prospects

While the benefits are clear, certain considerations remain. The cost of assembling such a cluster may be prohibitive for individual users or small teams, though it represents a reasonable investment for organizations dedicated to AI development. Additionally, some stability issues were noted during high-performance benchmarks, indicating areas for further refinement.

Looking ahead, the anticipated release of M5 Ultra Mac Studios could further enhance machine learning research. These future models are expected to support GPU neural accelerators, providing an additional performance boost. Moreover, extending Thunderbolt 5 connectivity to include SMB Direct could benefit professionals in latency-sensitive and high-bandwidth applications, such as video editing.

Conclusion

Apple’s integration of Thunderbolt 5 and RDMA in macOS Tahoe 26.2 marks a significant advancement in Mac cluster computing. By enabling efficient memory pooling and high-speed inter-device communication, these technologies empower AI researchers to tackle larger and more complex models than ever before. As the landscape of machine learning continues to evolve, such innovations will be instrumental in driving progress and unlocking new possibilities.