How AI impacts storage: Key features of AI workloads and the storage it needs

We look at the key characteristics of AI workloads, its needs in terms of I/O, storage capacity, object vs file, cloud vs on-prem, and the vendors that offer optimised solutions for GPU-driven environments

Artificial intelligence (AI) and machine learning (ML) promise a new level of automation, transforming everything from chatbots to large-scale generative models that create realistic content. Yet AI’s effectiveness depends on one critical foundation — data storage.

AI systems need vast, high-speed storage to feed training data to GPUs, manage intermediate datasets, and retain the massive outputs of trained models. This article explores how storage supports AI: from data flow and I/O patterns to the types of storage best suited to AI workloads, cloud and object storage roles, and what leading storage vendors are offering for AI-driven environments.

What are the key features of AI workloads?

AI workloads generally move through three main stages — training, inference, and deployment.

  • Training: Algorithms learn from data, adjusting weights and parameters until they can recognise patterns.
  • Inference: The trained model applies its knowledge to new data to make predictions or generate content.
  • Deployment: AI becomes embedded within applications, serving real-time insights or automated decision-making.

Workloads can vary widely. Training large language models (LLMs) resembles high-performance computing (HPC), demanding enormous parallel processing power and bandwidth. Smaller use cases, such as customer recommendations or predictive maintenance, may continuously infer results from new incoming data.

AI datasets also differ greatly in size and structure — from billions of small IoT sensor readings to multi-terabyte image and video files. Depending on frameworks like TensorFlow or PyTorch, data may be processed in many small files or fewer, larger ones. Increasingly, even backup and archive data are being reused as training sources, broadening the data types AI systems rely on.

What are the I/O characteristics of AI workloads?

AI’s hallmark is massively parallel processing, powered by GPUs, TPUs, or specialised accelerators. These devices handle huge volumes of data simultaneously, meaning that I/O speed — not just capacity — becomes the bottleneck.

Training runs require ultra-low latency and high throughput between storage and compute nodes. Data must stream quickly enough to keep GPUs fully occupied; otherwise, expensive hardware sits idle.

In practice, this means AI infrastructure must deliver:

  • Parallel access to thousands of files simultaneously.
  • High bandwidth for rapid data ingestion and model checkpointing.
  • Scalability to handle petabyte-scale datasets and their resulting models.
  • Data mobility across on-prem, edge, and cloud sites for distributed training.

Most AI data is unstructured, residing outside traditional databases — think images, audio, logs, or scientific readings. This makes the underlying storage architecture a crucial factor in AI performance and scalability.

What kind of storage do AI workloads need?

AI workloads thrive on fast, low-latency flash storage, often combined with NVMe for extreme throughput. The objective is simple: keep GPUs busy by supplying data as fast as they can consume it.

Typical AI training environments require hundreds of terabytes to petabytes of storage, depending on data volume and project complexity.

However, different AI frameworks access data in different ways. TensorFlow, for example, often handles large sequential files, while PyTorch manages thousands of smaller objects. Thus, optimal storage must support mixed I/O patterns — both random and sequential — with consistent high performance.

Suppliers increasingly position QLC flash arrays as a cost-effective solution for AI, bridging the gap between performance and capacity. Meanwhile, organisations are discovering new value in “secondary data” — backups and archives — which can now serve as additional AI training sources.

Is cloud storage suitable for AI?

Yes — but with trade-offs.

Cloud storage brings elastic scalability and global accessibility, enabling organisations to train and deploy AI without major upfront investment. By renting GPU clusters from hyperscalers, teams can scale experiments quickly and pay only for what they use.

Most AI projects begin in the cloud, where storage integrates directly with compute environments. The major cloud providers — AWS, Microsoft Azure, and Google Cloud — offer:

  • Pre-trained AI/ML models and APIs
  • On-demand GPU and TPU compute
  • Object, block, and file storage scalable to multiple petabytes

However, cloud is not always the most cost-efficient option. Long-term training or frequent data movement can generate high egress and compute costs, prompting some enterprises to migrate mature AI workloads back on-premises for predictable expenses and better data control.

Is object storage a good fit for AI?

Object storage is increasingly central to AI workflows because it can handle unstructured data at scale.

It stores data as “objects” with rich metadata, allowing billions of items to coexist in a single, flat namespace — ideal for AI models that need to scan and retrieve specific data types from massive repositories. Metadata also accelerates data discovery and management, helping refine training datasets.

Most modern object stores use the S3 protocol, allowing seamless integration between on-premises and cloud environments.

That said, object storage can have performance limits. Heavy metadata processing may strain controllers, and cloud-hosted object stores introduce costs whenever data is read or moved. Still, for long-term scalability and flexibility, object storage remains one of the best fits for AI-driven data growth.

What do storage vendors offering for AI?

The storage industry has quickly aligned with GPU computing ecosystems, particularly around Nvidia’s DGX infrastructure. Nvidia’s BasePOD and SuperPOD architectures define best practices for connecting compute, networking, and storage in AI environments.

Leading storage suppliers are now validating their systems against these Nvidia standards, optimising for GPU data pipelines, and developing retrieval-augmented generation (RAG) frameworks that validate AI outputs against trusted data sources to reduce “hallucinations.”

Here’s a snapshot of major vendors and their AI offerings:

  • DDN: A³I AI400X2 all-NVMe appliances with up to 90GB/s throughput and 3 million IOPS, validated for Nvidia SuperPOD.
  • Dell Technologies: The “AI Factory” stack combining PowerEdge XE9680 servers and PowerScale F710 storage, available via Dell Apex as-a-service.
  • IBM: Spectrum Storage for AI, scalable compute and storage validated for DGX BasePOD and SuperPOD.
  • Cohesity: Integrating Nvidia’s NIM microservices into its Gaia multicloud data platform to use backup data for AI training.
  • Hammerspace: GPUDirect-certified Hyperscale NAS delivering a global file system optimised for GPU-driven workloads.
  • Hitachi Vantara: Hitachi iQ, offering industry-specific AI systems combining DGX/HGX GPUs with Hitachi storage.
  • HPE: GenAI supercomputing with Nvidia components and RAG reference architectures, plus upgraded Alletra MP arrays with 100Gbps connectivity.
  • NetApp: Hybrid cloud OnTap storage integrated with Nvidia’s NeMo Retriever for RAG pipelines.
  • Pure Storage: AIRI architecture with FlashBlade//S arrays, integrating Nvidia NeMo-based microservices and vertical-specific RAGs.
  • Vast Data: QLC flash-based Vast Data Platform offering database-like capabilities and DGX certification.
  • Weka: Hardware appliances certified for Nvidia SuperPOD, designed for hybrid AI workloads.

These collaborations reflect an industry-wide push to eliminate I/O bottlenecks and ensure GPUs operate at full efficiency across AI infrastructure stacks.

The takeaway

AI is reshaping data infrastructure — and storage is its foundation. High-performance, scalable storage ensures that GPUs are fed efficiently during training and inference, while flexible architectures like object storage and cloud integration allow massive datasets to be managed intelligently.

As enterprises evolve from pilot projects to production AI, storage strategies must balance speed, scalability, and cost-efficiency. Whether on-prem, hybrid, or cloud-based, the goal is the same: keep data moving fast enough to power intelligence.

 


Read more about data management

Vector databases: The hidden engine of AI. Vector databases power modern AI by storing and searching high-dimensional data. Learn how vector embeddings work, their effect on data storage, plus a survey of the key providers.

File, block and object storage: How do they work in the cloud era? File, block, and object storage remain core to data management in the cloud era. Learn how they differ, how they evolved, and where each model fits best.