Elon Musk agrees with other AI experts that there is little real-world data left to train AI models on.
Good call out by @elonmusk. Exactly why you need sustainable, accurate, curated data from trustworthy sources. @StackOverflow’s OverflowAPI solves this very problem by creating an ongoing mechanism for data creation for LLM training. https://t.co/lbANedMwiJ
— Prashanth Chandrasekar (@pchandrasekar) January 9, 2025
“We’ve now exhausted basically the cumulative sum of human knowledge in AI training,” Musk said during a livestreamed conversation with Stagwell chairman Mark Penn late Wednesday. “That happened basically last year.”
Elon Musk concurs with other AI experts that there’s little real-world data left to train AI models on.https://t.co/5P20KVknKT
“We’ve now exhausted basically the cumulative sum of human knowledge …. in AI training,” Musk said during a live-streamed conversation with Stagwell…
— Amit Paranjape (@aparanjape) January 9, 2025
Musk, who owns AI company xAI, echoed themes addressed by former OpenAI chief scientist Ilya Sutskever at NeurIPS, the machine learning conference, in December.
"AI will do anything you want and even suggest things you never even thought of.
So, I mean, AI really within the next few years will be able to do any cognitive task.
It obviously begs the question, what are we all going to do?"
Elon Musk pic.twitter.com/o3JnVuOBOm— Tesla Owners Silicon Valley (@teslaownersSV) January 9, 2025
Sutskever stated that the AI industry had reached what he called “peak data” and predicted that a lack of training data would force a shift away from the current model development approaches. Indeed, Musk suggested that synthetic data — data generated by AI models themselves — is the path forward. “The only way to supplement [real-world data] is with synthetic data, where the AI creates [training data],” he said.
“With synthetic data, [AI] will sort of grade itself and go through this process of self-learning.”
'We've now exhausted basically the cumulative sum of human knowledge has been exhausted in AI training.
That happened, basically, last year.”
Elon Musk pic.twitter.com/dVYUq6vOxY— Tesla Owners Silicon Valley (@teslaownersSV) January 9, 2025
Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train flagship AI models. Microsoft’s recent AI models, which were open-sourced early Wednesday, were trained on synthetic data alongside real-world data.
Musk on synthetic data’s potential
Google and Meta have also used synthetic data to fine-tune their most recent series of models. Anthropic used synthetic data to develop one of its most performant systems. Training on synthetic data has additional advantages, like cost savings.
AI startup Writer claims its Palmyra X 004 model, which was developed using almost entirely synthetic sources, cost just $700,000 to develop. This contrasts significantly with the estimated $4.6 million cost of a comparably-sized OpenAI model. However, there are disadvantages as well.
Recent research suggests that synthetic data can lead to model collapse, where a model becomes less creative and more biased in its outputs, eventually seriously compromising its functionality. Because models create synthetic data, if the data used to train these models has biases and limitations, their outputs will be similarly tainted.
April Isaacs is a news contributor for DevX.com She is long-term, self-proclaimed nerd. She loves all things tech and computers and still has her first Dreamcast system. It is lovingly named Joni, after Joni Mitchell.























