Elon Musk’s artificial intelligence venture, xAI, has unveiled a new supercomputer named Colossus, boasting an impressive 100,000 Nvidia H100 graphics processing units (GPUs). This mammoth machine represents a significant leap in AI training capabilities, potentially
positioning xAI to compete with industry leaders.
To put Colossus’s scale into perspective, Meta’s Llama 3 large language model utilized 16,000 H100 chips for training. Meta recently announced plans to expand its AI infrastructure with two additional 24,000-chip clusters. Musk’s creation, therefore, appears to dwarf these existing systems in raw processing power.
However, not all tech industry figures are convinced by Musk’s latest reveal. Reid Hoffman, co-founder of LinkedIn, described the xAI supercomputer as merely “table stakes” in the competitive landscape of generative AI. Hoffman’s comments, reported by The Information, suggest that while impressive, Colossus may only bring xAI up to par with more established players like OpenAI and Anthropic, rather than surpassing them.
Chris Lattner, CEO of Modular AI, raised questions about Musk’s strategy during a panel discussion at The Information’s AI Summit. Lattner pointed out the apparent contradiction between xAI’s heavy reliance on Nvidia’s scarce and expensive GPUs and Musk’s ongoing efforts to develop his own GPU, known as Dojo. This approach contrasts with other tech giants like Meta, Microsoft, Alphabet, and Amazon, who are developing proprietary AI chips while still utilizing Nvidia’s products.
Musk has previously expressed concerns about the challenges of acquiring more Nvidia chips, citing the Dojo project as a potential solution to reduce dependence on the chipmaker. During a Tesla earnings call in July, Musk stated, “We do see a path to being competitive with Nvidia with Dojo. We kind of have no choice.”
The billionaire entrepreneur claims that Colossus was constructed in just 122 days, a feat reportedly unmatched by other companies. Musk has also announced plans to double the supercomputer’s size to 200,000 chips within months.
However, questions remain about Colossus’s current operational capacity. It’s unclear whether all 100,000 GPUs can run
simultaneously, as this would require advanced networking technology and substantial energy resources. Reports from The Information suggest that xAI may face power constraints at its facility.
In August, CNBC reported that an environmental advocacy group had raised concerns about xAI operating gas turbines without proper authorization to meet its enormous power demands. The Southern Environmental Law Center claimed in a letter that xAI had installed at least 18 unpermitted turbines, with potentially more planned.
Memphis Light, Gas and Water, the local utility, informed CNBC that it has supplied 50 megawatts of power to xAI since early August. However, the facility reportedly requires an additional 100 megawatts for full operation. Industry experts consulted by The Information estimate that the current power supply could only support a few thousand GPUs, far short of the claimed 100,000.
As the AI arms race intensifies, Musk’s bold claims about Colossus have certainly captured attention. However, the mixed reactions from industry leaders and the questions surrounding its operational capabilities suggest that the true impact of this supercomputer on the AI landscape remains to be seen. As companies continue to invest heavily in AI infrastructure, the coming months may reveal whether Colossus truly represents a game-changing advancement or merely keeps xAI in the running with its more established competitors.