Reimagining the Ethernet Switch: Broadcom Ships new Tomahawk Ultra

16 July 2025

Ultra-low Latency, 64B Line-rate switching, lossless fabric and in-Network Collectives defining a new level of performance

Broadcom Inc., have started to ship their Tomahawk Ultra ethernet switch. Engineered for high-performance computing (HPC) and AI workloads, Tomahawk Ultra delivers industry-leading ultra-low latency, massive throughput, and lossless networking.

“Tomahawk Ultra is a testament to innovation, involving a multi-year effort by hundreds of engineers who reimagined every aspect of the Ethernet switch,” said Ram Velaga, senior vice president and general manager of Broadcom’s Core Switching Group. “This highlights Broadcom’s commitment to invest in advancing Ethernet for high-performance networking and AI scale-up.”

Redefining Performance

Built from the ground up to meet the extreme demands of HPC environments and tightly coupled AI clusters, Tomahawk Ultra delivers:

Ultra-low latency: Achieves 250ns switch latency at full 51.2 Tbps throughput.
High performance: Delivers line-rate switching performance even at minimum packet sizes of 64 bytes, supporting up to 77 billion packets per second.
Adaptable, optimized Ethernet headers: Reduces header overhead from 46 bytes down to as low as 10 bytes, while maintaining full Ethernet compliance —boosting network efficiency and enabling flexible, application-specific optimizations.
Lossless fabric: Implements Link Layer Retry (LLR) and Credit-Based Flow Control (CBFC) to eliminate packet loss and ensure reliability.

“AI and HPC workloads are converging into tightly coupled accelerator clusters that demand supercomputer-class latency — critical for inference, reliability, and in-network intelligence from the fabric itself,” said Kunjan Sobhani, lead semiconductor analyst, Bloomberg Intelligence. “Demonstrating that open-standards Ethernet can now deliver sub-microsecond switching, lossless transport, and on-chip collectives marks a pivotal step toward meeting those demands of an AI scale-up stack — projected to be double digit billions in a few years.”

Built for HPC and AI Scale-Up

Tomahawk Ultra is optimized for the tightly coupled, low-latency communication patterns found in both high-performance computing systems and AI clusters. With ultra-low latency switching and adaptable optimized Ethernet headers, it provides predictable, high-efficiency performance for large-scale simulations, scientific computing, and synchronized AI model training and inference.

When deployed with Scale-Up Ethernet (SUE specification can be found here), Tomahawk Ultra enables sub-400ns XPU-to-XPU communication latency, including the switch transit time — setting a new benchmark for tightly synchronized AI compute at scale.

Forrest Norrod, Executive Vice President and General Manager, Data Center Solutions Group, AMD, comments, “Low latency is essential to unleashing the full potential of AI — from reducing training times to powering real-time inference. By combining Broadcom’s new Tomahawk Ultra switch with AMD Instinct^™ GPUs and EPYC^™ processors, we’re enabling high-performance, standards-based Ethernet solutions for AI infrastructure. Together, we’re advancing an open ecosystem that brings our vision of AI everywhere, for everyone, closer to reality.”

By reducing Ethernet header overhead from 46 bytes to just 10 bytes, while maintaining full Ethernet compliance, Tomahawk Ultra dramatically improves network efficiency. This optimized header is adaptable per application, offering both flexibility and performance gains across diverse HPC and AI workloads.

Tomahawk Ultra incorporates lossless fabric technology that eliminates packet drops during high-volume data transfer. Incorporating LLR, the switch detects link errors using Forward Error Correction and automatically retransmits packets, avoiding drops at the wire level. Simultaneously, CBFC prevents buffer overflows that traditionally caused packet loss. Together, these mechanisms create a truly lossless Ethernet fabric, delivering the level of reliability demanded by today’s most data-intensive workloads.

Tomahawk Ultra also accelerates performance through In-Network Collectives solving one of the most persistent bottlenecks in AI and machine learning workloads. Rather than burdening XPUs with collective operations like AllReduce, Broadcast, or AllGather, Tomahawk Ultra executes these directly within the switch chip. This can reduce job completion time and improve utilization of expensive compute resources. Importantly, this capability is endpoint-agnostic, enabling immediate adoption across a wide range of system architectures and vendor ecosystems.

Praveen Jain, Senior Vice President and General Manager, AI Clusters and Cloud Ready Data Center, HPE Networking, comments, “HPE is committed to delivering open, high-performance and easy-to-manage Ethernet-based solutions for the modern data center. We commend Broadcom on its new offering, and its ultra-low latency, high throughput and support for in-network collectives align perfectly with what today’s workloads demand. It reflects our shared vision for building the most advanced and open data center infrastructure solutions with operational simplicity at its core.”

Designed with innovations in topology-aware routing to support advanced HPC topologies including Dragonfly, Mesh and Torus, Tomahawk Ultra is also compliant with the UEC standard and embraces the openness and rich ecosystem of Ethernet networking.

Introducing SUE-Lite

As part of Broadcom’s Ethernet-forward strategy for AI scale-up, the company has introduced SUE-Lite — an optimized version of the SUE specification tailored for power and area-sensitive accelerator applications. SUE-Lite retains the key low-latency and lossless characteristics of full SUE, while further reducing the silicon footprint and power consumption of Ethernet interfaces on AI XPUs and CPUs.

This lightweight variant enables easier integration of standards-compliant Ethernet fabrics in AI platforms, promoting broader adoption of Ethernet as the interconnect of choice in scale-up architectures.

Platform for AI Scale-Up and HPC Scale-Out

Together with the 102.4 Tbps Tomahawk 6, Tomahawk Ultra forms the foundation of a unified Ethernet architecture: enabling scale-up Ethernet for AI, and scale-out Ethernet for HPC and distributed workloads.

Now Shipping

Tomahawk Ultra is 100% pin-compatible with Tomahawk 5, ensuring a very fast time-to-market. It is shipping now for deployment in rack-scale AI training clusters and supercomputing environments.