The rapid advancement of artificial intelligence (AI) has been nothing short of remarkable, with the industry setting unprecedented expectations and delivering innovation, growth, and higher integration. However, beneath the surface lies a looming crisis that threatens to stall AI development: the scarcity of publicly available data for training large language models (LLMs). While funds, computing power, and talent are abundant, sourcing data will become a major problem for AI companies by the turn of the decade. The demand for AI computing power is doubling every 100 days, and the industry is projected to increase over a million times in the next five years. Furthermore, energy-guzzling LLM training may have alternative solutions, such as Microsoft’s plan to build a small-scale reactor to replace fossil fuels for its data center and computing needs. However, the demand for data, the building blocks of an LLM’s consciousness, is rising exponentially, and experts predict that companies will exhaust publicly available data for LLM training between 2026 and 2032.

AI Development’s Dark Horse – The Looming Data Scarcity Crisis
The demand for AI computing power is doubling every 100 days, according to Intelligent Computing: The Latest Advances, Challenges, and Future research.
1–2 minutes










