Enhancing GPU Clusters for AI Performance
Together AI has made a significant upgrade to its GPU clusters by integrating the NVIDIA H200 Tensor Core GPU. This enhancement, accompanied by the Together Kernel Collection (TKC), aims to boost performance in AI training and inference tasks.
Enhanced Performance with TKC
The Together Kernel Collection (TKC) accelerates common AI operations significantly, providing up to a 24% speedup for training operators and up to a 75% speedup for FP8 inference operations, compared to standard implementations. This improvement leads to cost efficiencies and faster time to market.
Training Optimization
- TKC offers optimized kernels like MLP with SwiGLU activation for training large language models (LLMs).
- Kernels are reported to be 22-24% faster than standard implementations.
Inference Optimization
- FP8 kernels deliver more than 75% speedup over base PyTorch implementations for inference tasks.
Native PyTorch Compatibility
TKC seamlessly integrates with PyTorch, allowing developers to leverage optimizations within their existing frameworks by simply changing import statements.
Production-Level Testing
TKC undergoes rigorous testing to ensure high performance and reliability for real-world applications. All Together GPU Clusters, like the H200 and H100, will include TKC.
NVIDIA H200 Advancements
The NVIDIA H200 Tensor Core GPU, based on the Hopper architecture, offers faster performance and larger memory compared to the H100.
Inference Performance
- The H200 provides faster inference performance on various models compared to its predecessor, the H100.
- Features 141GB of HBM3e memory and 4.8TB/s of memory bandwidth.
High-Performance Interconnectivity
Together GPU Clusters use SXM form factor with NVIDIA’s NVLink and NVSwitch technologies for fast communication between GPUs, ideal for AI training and HPC workloads.
Cost-Effective Infrastructure
Together AI offers infrastructure up to 75% more cost-effective than cloud providers, with flexible commitment options for all stages of the AI development lifecycle.
Reliability and Support
Together AI ensures 99.9% uptime SLA with rigorous testing and White Glove Service for end-to-end support, from setup to maintenance.
Flexible Deployment Options
Options include Slurm, Kubernetes, and bare metal clusters running Ubuntu, catering to various AI project needs.
Accelerating AI Development
Together AI’s high-performance NVIDIA H200 GPU Clusters and TKC optimize performance, reduce costs, and ensure reliability throughout the AI development lifecycle.
Hot Take: Elevating AI Performance with Together AI
If you’re looking to enhance your AI projects with top-of-the-line GPU clusters and optimizations, Together AI’s integration of NVIDIA H200 and the TKC could be the game-changer you need. With increased performance, cost efficiencies, and reliable support, Together AI is paving the way for accelerated AI development.