Run the most demanding AI workloads faster, including generative AI, computer vision, and predictive analytics, anywhere in our distributed cloud. Use Oracle Cloud Infrastructure (OCI) Supercluster to scale up to 32,768 GPUs today and 131,072 GPUs soon.*
Learn how to speed up AI training and inference
OCI AI infrastructure provides the highest-tier performance and value for all AI workloads—including inferencing, training, and AI assistants.
Take advantage of high performance mount targets (HPMTs) for up to 500 Gb/sec of sustained throughput. Use 61.44 TB of local storage capacity, the highest in the industry for instances with NVIDIA H100 GPUs.
Oracle’s distributed cloud enables you to deploy AI infrastructure anywhere to help meet performance, security and AI sovereignty requirements.
Up to 131,072 GPUs, 8X more scalability
Network fabric innovations will enable OCI Supercluster to scale up to 131,072 NVIDIA B200 GPUs, more than 100,000 Blackwell GPUs in NVIDIA Grace Blackwell Superchips, and 65,536 NVIDIA H200 GPUs. Get up to 8X more scalability over current limits for OCI Supercluster with NVIDIA A100 and H100 GPUs.
Whether you’re looking to perform inferencing or fine-tuning or train large scale-out models for generative AI, OCI offers industry-leading bare metal and virtual machine GPU cluster options powered by an ultrahigh-bandwidth network and high performance storage to fit your AI needs.
Compute
• 8x NVIDIA H100 GPUs; 61.44 TB NVMe SSDs per node
• 8x NVIDIA A100 GPUs; 27.2 TB NVMe SSDs per node
• 4x NVIDIA L40S GPUs; 7.38 TB NVMe SSDs per node
Storage
• Block storage: Up to 32 TB per volume
• Object storage: Up to 10 TiB per object
• File storage: Up to 8 EB per file system
• Storage clusters with Dense I/O shapes
Networking
• RDMA over Converged Ethernet (RoCE v2)
• Few microseconds of latency between nodes
• OCI Supercluster internode bandwidth:
o NVIDIA H100: 3200 Gb/sec
o NVIDIA A100: 1600 Gb/sec
o NVIDIA L40S: 800 Gb/sec
OCI bare metal instances powered by NVIDIA L40S, H100, and A100 GPUs enable customers to run large AI models for use cases that include deep learning, conversational AI, and generative AI. With OCI Supercluster, customers can scale up to 32,768 A100 GPUs, 16,384 H100 GPUs, and 3,840 L40S GPUs per cluster.
High-speed RDMA cluster networking powered by NVIDIA ConnectX network interface cards with RDMA over Converged Ethernet version 2 lets you create large clusters of GPU instances with the same ultralow-latency networking and application scalability you expect on-premises.
You don’t pay extra for RDMA capability, block storage, or network bandwidth, and the first 10 TB of egress is free.
Through OCI Supercluster, customers can access local, block, object, and file storage for exascale computing. Among major cloud providers, OCI offers the highest capacity of high performance local NVMe storage for more frequent checkpointing during training runs, resulting in faster recovery from failures.
HPC file systems, including BeeGFS, GlusterFS, Lustre, and WEKA, can be used for AI training at scale without compromising performance.
Watch Chief Technical Architect Pradeep Vincent explain how OCI Supercluster powers the training and inferencing of machine learning models, scaling to tens of thousands of NVIDIA GPUs.
Train AI models on OCI bare metal instances powered by GPUs, RDMA cluster networking, and OCI Data Science.
Protecting the billions of financial transactions that happen every day requires enhanced AI tools that can analyze large amounts of historical customer data. AI models running on OCI Compute powered by NVIDIA GPUs along with model management tools such as OCI Data Science and other open source models help financial institutions mitigate fraud.
AI is often used to analyze various types of medical images (such as X-rays and MRIs) in a hospital. Trained models can help prioritize cases that need immediate review by a radiologist and report conclusive results on others.
Drug discovery is a time consuming and expensive process that can take many years and cost millions of dollars. By leveraging AI infrastructure and analytics, researchers can accelerate drug discovery. Additionally, OCI Compute powered by NVIDIA GPUs along with AI workflow management tools such as BioNeMo enables customers to curate and preprocess their data.
Oracle offers a free pricing tier for most AI services as well as a free trial account with US$300 in credits to try additional cloud services. AI services are a collection of offerings, including generative AI, with prebuilt machine learning models that make it easier for developers to apply AI to applications and business operations.
You also only have to pay compute and storage charges for OCI Data Science.
Learn more about RDMA cluster networking, GPU instances, bare metal servers, and more.
Oracle Cloud pricing is simple, with consistent low pricing worldwide, supporting a wide range of use cases. To estimate your low rate, check out the cost estimator and configure the services to suit your needs.
Get help with building your next AI solution or deploying your workload on OCI AI infrastructure.