Many topics were shared during the keynote, and in this short blog post, we will review some of the highlights.
The technical aspects began with David Brown, VP of AWS Compute & Networking.
AWS Graviton
David shared how the Graviton processor evolved over the years.
If we use the Graviton2 processor as a baseline for performance comparison, the Graviton3 is capable of producing 60% more performance (than Graviton2) in real workload using NGINX, and the Graviton4 is capable of producing 40% more performance (than Graviton3) in real workload using NGINX.
Graviton processors are powering many of the popular AWS services:
AWS Nitro System
All new AWS compute services in the past couple of years are powered by the Nitro System, which offers better performance and hardware-enforced separation.
For more information:
https://docs.aws.amazon.com/whitepapers/latest/security-design-of-aws-nitro-system/the-components-of-the-nitro-system.html
AWS Trainium
Peter Desantis shared information about the AWS Trainium processors for generative AI workloads, and its architecture.
For more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-hardware/trainium.html
Systolic Array
A systolic array is a specialized architecture used in parallel processing, particularly effective for tasks like matrix multiplication and convolution operations in deep learning.
For more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html
Neuron Kernel Interface (NKI)
The Neuron Kernel Interface (NKI) is a programming interface introduced by AWS as part of the Neuron SDK, designed to optimize compute kernels specifically for AWS Trainium and Inferentia chips. It enables developers to create high-performance kernels that enhance the capabilities of deep learning models.
For more information: https://aws.amazon.com/about-aws/whats-new/2024/09/aws-neuron-nki-nxd-training-jax/
Announcement - Latency-optimized inference option for Amazon Bedrock (Available in Preview)
Latency-optimized inference for foundation models in Amazon Bedrock is now available in public preview, delivering faster response times and improved responsiveness for AI applications. Currently, these new inference options support Anthropic's Claude 3.5 Haiku model and Meta's Llama 3.1 405B and 70B models offering reduced latency compared to standard models without compromising accuracy.
For more information:
https://aws.amazon.com/about-aws/whats-new/2024/12/latency-optimized-inference-foundation-models-amazon-bedrock/
https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html
UltraCluster 2.0 and the 10p10u network
The last information discussed in the keynote was the UltraCluster and its underlying network which AWS internally calls 10p10u.
For more information: https://www.aboutamazon.com/news/aws/aws-infrastructure-generative-ai
The entire keynote video can be found at https://www.youtube.com/watch?v=vx36tyJ47ps
About the author
Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.
You can connect with him on social media (https://linktr.ee/eyalestrin).
Opinions are his own and not the views of his employer.