Understanding Inference in AI Hardware
Inference is becoming a crucial topic in AI hardware discussions. Nvidia’s CFO highlighted that inference constituted around 40% of the company’s impressive second-quarter revenue. AWS’s CEO also noted that inference likely accounts for half of all AI computing tasks today. This growing focus on inference has attracted numerous companies eager to compete with Nvidia.
Key Developments in Inference Technology
- Groq, founded by ex-Google employees, raised $640 million to focus on inference hardware, achieving a valuation of $2.8 billion.
- Positron AI unveiled a new inference chip, claiming it can match Nvidia’s H100 performance at a significantly lower cost.
- Amazon is developing its own chips, named Trainium and Inferentia, for training and inference tasks respectively.
- Cerebras has introduced a powerful inference chip, boasting 7,000 times the memory bandwidth of Nvidia’s H100.
The Importance of Inference in AI Progress
The shift towards inference is essential for the growth of AI applications. As companies move from training to inference, they can deliver functioning products to customers. AWS’s CEO emphasized that for the substantial investments in AI infrastructure to be fruitful, inference workloads must dominate. The boundaries between training and inference may blur in the future, as businesses seek to optimize their operations. This evolution is critical for the advancement of AI technology and its applications across various industries.











