Solidigm: NAND demand soars as cloud providers underestimate AI inferencing

Cloud and enterprise AI workloads are generating far more inference data than expected, driving a sudden spike in NAND demand, flash prices, and pressure on AI project ROI

NAND demand and prices have soared, in part because cloud providers and enterprises have underestimated the amount of flash storage they will need for AI workloads.

That’s the view of Solidigm director of leadership narrative and evangelist, Scott Shadley, who spoke last week at the Technology Live! event in London.

Core to that unexpected demand are unforeseen data volumes during AI inference. Possible knock-on effects are that enterprises might find it difficult to source solid state media and that rising flash costs will impact ROI of AI projects.

Solidigm is owned by south Korean storage media heavyweight SK Hynix, and currently has the highest capacity commodity flash drive on the market, at 122.88TB.

Shadley said his company had been inundated with requests from cloud providers to buy storage media, with the chief cause being that they had underestimated the data volume demands of AI inferencing.

Cloud providers underestimate AI inferencing

“It’s just been amazing the amount of CSPs that came to us all at the same time with the same theme,” said Shadley. “They said, ‘We grossly underestimated our requirements for storage. We want to buy five times, ten times the amount of storage from you.’”

“So why did they first underestimate? Because of inferencing,” he said. “They thought, ‘Hey, we’ve got this fleet of hard drives, we could use it for inferencing.’ And then the latency of the drives and the performance wasn’t there. And the amount of data these inference models are churning out from the large language models overwhelmed their storage.”

Multiple inferencing iterations

Key to that massive growth in data is that inferencing is turning out to not be a one-off phase in AI workloads. Instead, there can be multiple inference iterations after modelling has taken place. For example when updated data is added and refinement to the model takes place – as would happen as production creates new information – or when RAG adds new data. And, all these possible further steps after modelling create checkpoints that can see data volumes balloon.

“So, when we did AI originally, we’d train it, we’d send it out, and everybody would use a model,” said Shadley. “And you’d get a right answer, a wrong answer, or a no answer, and you had no idea if it’s right or wrong. So we said, let’s do retrieval-augmented generation [RAG].”

“So now I take a trained model, run it through, do an inference step, verify it on itself, do another inference step, verify it again, and do multiple RAG cycles. And then if this whole room jumps on and enters the same prompt at the same time, we generate ten times more data than a year ago.

Flash demand soared overnight

According to Shadley, new demand soared “literally overnight”, and that is being met to some extent, but higher prices, and that will have its effect on ROI calculations and profitability of AI projects.

“So literally overnight, about three months ago, the industry found they needed 100 exabytes more NAND storage,” said Shadley.

“We’ve seen this movie many, many times when there’s a capacity shortage, he added. “And if somebody’s going to pay you extra, that means that the cost of delivering an AI solution goes up, or the ROI reduces.”

“So prices are going to go up,” said Shadley. “But there is elasticity of demand and when demand is destroyed by 50% then prices will come down.”

Prices put flash out of enterprise reach

One knock on effect of the current jump in NAND prices, according to Shadley, is that storage media is going to be priced out of the reach of enterprises. That’s because as prices rise, it will be the cloud operators who have the economies of scale and market muscle to procure product.

“Enterprises are not going to be able to do this on their own premises because they’re not going to be able to get the infrastructure to run AI,” said Shadley. “If you have enough AI demand that you can amortise at the high cost of the machine, it makes sense. But I think a lot of people don’t have that demand today, so cloud is where a lot of the AI is happening. Because the cloud providers are placing bigger orders, their orders are going to be fulfilled first.”

“So I think the signs are that enterprises are going to find it very expensive to buy the flash drives they need to do AI.”

That also poses a new problem, said the Solidigm man, which if data for AI workloads has to spend more time in the cloud and has to be managed between on-premises and in the cloud, that will drive up the cost of management and operations.

The takeaway

AI inference is producing far more data—and far more iterations—than cloud providers or enterprises anticipated. According to Solidigm’s Scott Shadley, this has triggered an overnight surge in NAND demand, with CSPs scrambling to secure up to 10× more flash capacity. The resulting supply pressure is pushing flash prices sharply upward, squeezing AI project ROI and potentially pricing enterprises out of on-prem deployments. As clouds absorb most available supply, organisations may face rising operational costs and greater reliance on hybrid data management just to keep AI workloads running efficiently.

Read more about AI and storage

Hammerspace drives ‘em “crazy” with global namespace, performance and GPU data placement. Hammerspace unifies storage across vendors into one high-performance namespace, cutting costs and feeding AI/GPU workloads at speed—leaving traditional storage players frustrated.

How AI impacts storage: Key features of AI workloads and the storage it needs. We look at the key characteristics of AI workloads, its needs in terms of I/O, storage capacity, object vs file, cloud vs on-prem, and the vendors that offer optimised solutions for GPU-driven environments.