Shared Everything

Vectors, AI, and the Infrastructure Fury Ahead

Episode Summary

In this episode of Shared Everything, Nicole Hemsoth Prickett chats with Jeff Denworth, co-founder of VAST Data, about why the Big Data hype of Hadoop is now ancient history—and how the real story is quietly exploding around metadata. They explore how the seemingly humble metadata concept has transformed into something vast and complicated, driven by the rise of AI, vectors, and deep-learning embeddings. Jeff reveals why traditional infrastructure struggles at exabyte scales, how vector databases underpin retrieval-augmented generative AI (RAG), and what companies can do right now to future-proof their systems for an increasingly agent-driven world. It’s a sharp, fast-moving tour through an AI landscape on the verge of metadata-induced chaos, raising deep questions about who (or what) will own the enterprise workforce of tomorrow.

Episode Notes

00:00 – Introduction

Nicole introduces Jeff Denworth, reminiscing about the Big Data era (~2010–2014).

01:15 – Big Data to Big Metadata

Jeff reflects on the Big Data era (Hadoop, analytics, NoSQL).

Today's valuations (Snowflake, Databricks) suggest Big Data's continued relevance.

02:11 – The Rise of Big Metadata

Jeff describes the shift from Big Data to Big Metadata.

AI creates new categories and applications, rapidly driving data infrastructure demands.

Example: Nvidia’s rapid growth due to deep learning-driven workloads.

05:01 – Synthetic Data and Metadata Explosion

Jeff notes social networks using synthetic data to circumvent privacy regulations.

Metadata types include large-scale data catalogs to manage exabytes of data (e.g., OpenAI).

08:14 – Dynamic Data Catalogs

VAST Database as an example of transactional and analytical infrastructure.

Benefits of SQL queries replacing traditional file operations for faster data handling.

09:50 – Metadata Evolves with Vectors

Explanation of embeddings, vector databases, and similarity search.

AI-driven understanding of unstructured data via vectors.

11:56 – Massive Scale of Vector Databases

Rough estimate: ~40 trillion vectors per 100 petabytes of data.

Challenges with conventional vector databases at massive scale (cost, memory, speed).

13:22 – Future Scale Problems and AI-driven Data Engineering

Retrieval-Augmented Generation (RAG) increases vector database scale needs.

Nvidia's data flywheel concept accelerates embedding and data engineering automation.

15:48 – Predicting Infrastructure Needs (Two-Year Outlook)

Jeff predicts AI models will significantly improve data engineering within two years.

Enterprises need vector databases capable of transactional, real-time performance.

18:10 – Future-Proofing Infrastructure (Five-Year Outlook)

Jeff expects AI-driven automation to impact all business processes (factories, back-office).

Businesses must be prepared for rapid scaling and foundational AI-driven changes.

21:14 – Industries Leading the AI Infrastructure Race

AI adoption speed varies by industry—highest "fury" is in software development.

Banks and trading firms leverage AI differently: profit efficiency vs. alpha-seeking.

23:55 – Cloud vs. On-Premises Infrastructure Choices

Jeff sees hybrid approaches prevailing; decision-making depends on enterprise-specific needs.

Introduces idea of "agentic workforce" prompted by Jensen Huang's statement (100M AI agents).

24:31 – Agent Ownership and Future Consequences

Raises profound questions about ownership and management of AI agents in business.

Jeff notes limited current customer recognition of these deeper implications.

25:56 – Closing Remarks

Nicole and Jeff conclude by noting broad societal implications of AI-driven changes.

Emphasis on importance of continued discussions around big metadata.