A recent study by Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by sometime between 2026 and 2032. This raises concerns about the future of AI development, as the current pace of progress may slow down once the reserves of human-generated writing are depleted. The study suggests that companies may have to rely on sensitive data, such as emails or text messages, or use “synthetic data” created by other AI models, which can be less reliable. Alternatively, developers could focus on building more skilled training models that are specialized for specific tasks, rather than relying on larger models. The study’s findings have sparked debate about the future of AI development and the importance of high-quality data.

AI Data Drought Looms
Tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade — sometime between 2026 and 2032.
1–2 minutes










