Amazon Web Services announced that its next generation of GPU-powered Amazon EC2 P4d instances. The new GPU-powered instances deliver 3x faster performance, up to 60% lower cost, and 2.5x more GPU memory for machine learning training and HPC workloads when compared to P3 instances. It features 8 NVIDIA A100 Tensor Core GPUs and 400 Gbps of network bandwidth. With P4d instances with AWS’s Elastic Fabric Adapter and NVIDIA GPUDirect RDMA users will be able to P4d instances with EC2 UltraClusters capability.
Scale to over 4,000 A100 GPUs
EC2 UltraClusters also allows scaling P4d instances to over 4,000 A100 GPUs by making use of AWS-designed non-blocking petabit-scale networking infrastructure integrated with Amazon FSx for Lustre high-performance storage. It also offers on-demand access to supercomputing-class performance to accelerate machine learning training and HPC.
AWS Nitro System
Amazon’s new P4d instances are built on the AWS Nitro System, which is hardware and software designed by AWS. Nitro System enables AWS to deliver an ever-broadening selection of EC2 instances and configurations to users. It also offers performance that is indistinguishable from bare metal, providing fast storage and networking and ensuring more secure multi-tenancy. Dave Brown, Vice President, EC2, AWS, said,
“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower-cost way to train their massive machine learning models. Now, with EC2 UltraClusters of P4d instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”