Rohan's blog

Decoding AI, one token at a time.

GenAI and data moats

Data security leaders, like Rubrik and Cohesity, have, just like the rest of us, been extremely privy to the AI inflection point. In fact, since long before ChatGPT set the world on fire, they’ve been on a mission to leverage the power of AI to solve their customers’ problems around data and security management.

But as foundational AI models (both, large and small ones) arguably get consolidated and commoditized, it’s becoming clearer than ever that data is the going to be the real competitive moat for any enterprise. The quantity and quality of data, and how quickly and easily companies are able to put it to use to harness the benefits of AI will soon be a big determinant of whether a company thrives (or even survives) in this era of computing. And data security cloud leaders like Rubrik and Cohesity are extremely well-positioned to create tons of value, given the sheer volume (exabytes?) of enterprise data estate under their platforms’ management. Earlier this week, even Mark Zuckerberg, in his “Open Source AI Is the Path Forward” letter, called data security out as a top concern for enterprises in discussions today around GenAI strategy. This is a massive opportunity for the data security platform vendors.

This leads to a super interesting (and evolving) trend in the offerings that these data and security cloud companies are launching. I believe there are three categories of offerings that data security cloud vendors like Rubrik and Cohesity need to think about when it comes to differentiating with AI:

  1. Table-stakes: AI for cybersecurity and resiliency
  2. Differentiators: Harnessing the “data moat” beyond cyber security
  3. Transformational: Empowering enterprises to build Gen AI apps securely and responsibly with their data

AI for cybersecurity and resiliency

Both Rubrik and Cohesity have a bunch of AI products to accelerate cybersecurity and resilience in market today. For example, Rubrik’s portfolio includes:

  • AI-Powered Cyber Recovery to help organizations accelerate their time to recover from a cyber attack through comprehensive task lists and guided workflows tailored to each recovery scenario
  • Partner integrations, providing customers with end-to-end flows such as with Microsoft Sentinel where anomaly alerts are sent to Sentinel to notify the customer about cybersecurity threats
  • Ruby, the AI companion app, designed to simplify and automate cyber incident response and recovery (including detecting anomalies, aiding their own customer support teams with more insights, and simplifying security management for their customers through a natural language interface powered by Rubrik’s internal expertise on security)

Besides the LLM-based Ruby companion app for SOC teams, which is a very recent innovative solution to the growing complexity of cybersecurity management, the category of “cybersecurity products that use AI for threat detection and cyber resilience” isn’t a new one. So, generally speaking, this is a class of AI features that are almost necessary to have a seat at the table when a customer is baking off between cybersecurity products for their enterprise.

Harnessing the “data moat” beyond cyber security

Taking a step back, in tackling data security challenges with a platform (or cloud) approach, cybersecurity vendors like Rubrik and Cohesity have effectively built data management products. This means they have a moat (i.e., enterprise data, and the platform that manages it) that isn’t necessarily specific to cybersecurity, which they can leverage to solve, more broadly, “data problems” that enterprises face while trying to implement GenAI.

As an example, Cohesity recently launched Gaia, a “conversational search assistant to help businesses transform secondary data into knowledge”. This is an LLM (Azure OpenAI)-based natural language interface, leveraging the retrieval augmented generation (RAG) architecture, that execs and leaders can chat with to get business insights from the enterprise’s secondary data. And, as their CEO Sanjay Poonen states in this interview, they fully intend to extend that capability to enterprises’ primary data in the near future – in doing so, stepping out of their secondary data roots.

Cohesity is leveraging it’s “data moat”, which is not just the massive volume of data under it’s management, but also the mature product they’ve built with security and RBAC in mind, that they can now leverage to provide secure and responsible AI for any business application.

The direction Cohesity, the “secondary/backup data company” we’ve all known for a while, has taken with this is frankly quite thought-provoking. If we’re to take this as a signal for what’s ahead in this industry, it won’t be long before Rubric and the rest of the pack catch up. One can imagine the next incarnation of Gaia-like chat apps solving for multiple different business needs, such as:

  • Business insights, for execs and company leaders to quickly get insights on business health and performance based on current and historical data (already supported with Gaia today)
  • Customer support, to respond to known support asks, thereby giving enterprise support teams the bandwidth to focus on complex or larger-scale supportability problems
  • Cyber incident response and recovery, to provide SOC teams with an AI companion designed to simplify and automate cyber resiliency (just like Rubrik’s Ruby does today) 

Each of these would be available with the data platform as a chat application that surfaces insights from enterprise data through a natural language interface. We can also imagine that over time, a growing roster of business applications would continue to get built out in the product based on customer demand.

Which brings me to the third and final category…

Empowering enterprises to build Gen AI apps securely and responsibly with their data

Data security companies have an opportunity to play a transformational role in accelerating the use of generative AI in enterprises by empowering them to build their own GenAI apps (with their own data) quickly and at scale, in a secure and responsible way.

While it may be extremely valuable to use enterprise data, RAG and LLMs to build a domain-specific applications for enterprises, as Cohesity has already begun doing, if there’s one thing we know for sure, it’s that there is no such thing as one-size-fits-all. I believe Cohesity will try hard to learn from customers and build generalized-yet-domain-specific apps (which would of course be valuable since they’d leverage enterprise data). But to really scale across the plethora of use-cases in enterprises today and really tackle the big potential addressable market, building an extensible trusted data platform for AI will be key.

This may manifest in the form of a platform (packaged with the data security cloud, of course), consisting of:

  • Retrieval augmented generation (RAG) based architecture
  • a vector database
  • an LLM
  • fine-grained role-based access control (RBAC) for all AI data to ensure secure, compliant and responsible use of data for the AI apps

Enterprises would be able to leverage all of these platform services to build GenAI apps. They wouldn’t have to worry about security of course, but they also wouldn’t have to worry about implementing RAG, vector databases, RBAC, or managing an LLM.

The best part: you may have noticed that all of these “platform” pieces were probably already needed to implement Ruby or Gaia in the first place. If done right, this may just be a natural evolution of the use-case based apps (Gaia, Ruby) that these companies are already building. So, productizing platform software pieces that were originally built for an end-to-end use-case… this is kinda sorta starting to resemble how AWS (or dare I say, the public cloud) was born.

On responsible AI

Orthogonal to the three categories of offerings I described above, a core part of what it will take to accelerate Gen AI in enterprises is ensuring that Gen AI apps meet the responsible AI standards of the enterprise and industry at large. This is another area where the “data moat” puts companies like Rubrik and Cohesity in a great position to be leaders in enabling GenAI for the enterprise. Their products already give enterprise administrators a zero-trust security platform with fine-grained role-based access controls (RBAC) on the data – which they’d need to extend to ensure that Gen AI applications can only access data that has the required authorization based on the user. As an example, admins would need the ability to ensure that sensitive financial data of a company is tied to the role of an exec, and the AI chat applications would only be able to access this data when responding to queries from execs with that role. This not only keeps the data secure, but also helps with compliance to meet certain regulatory requirements.

The data cloud companies are also well-positioned to leverage their existing partnerships (ex: with Microsoft if they’re using Azure OpenAI) to ensure the entire “AI supply chain” is secure and adheres to the responsible AI standards of the industry.

In conclusion

The AI excitement only gets better and better with time. As foundational models arguably get commoditized, the focus is quickly shifting to enterprise data and realizing the monetary benefits of GenAI. Data and security cloud companies like Rubrik and Cohesity are perfectly positioned to become leaders not just in the security space, but in the general “AI data” space, and this is especially aided by the intense focus on security, ethics and responsible AI. These companies have the opportunity, not just to be extremely successful with enabling AI in the enterprise, but also to influence the impact of AI on our lives. No better time to be a product person.

Thanks for reading!

Rohan