SEO, GEO, and the future of content discovery Permalink

The End of Advertising by Michael Mignano was thought-provoking, but not for the reasons I expected.

Anyone who optimizes for growing their viewership or audience, from independent authors to businesses, is foolish to ignore search engine optimization (SEO). Like it or not, it works, driving content to more potential viewers.

And where there’s a system to game, there’s a marketplace of solutions to help you game it.

He alludes to the fact that generative AI (GenAI) platforms are going to incorporate ads, and I’m sure they’ll dominate the advertising ecosystem over time. The unexpected thought that his article provoked: what will happen to the creator ecosystem, and the millions (billions?) of dollars that have been poured into the SEO industrial complex?

Consider these assertions:

  • We’ll find and consume content increasingly via GenAI products like ChatGPT and Perplexity, rather than traditional search engines.
  • GenAI will sometimes link back to original sources, either by design (as Perplexity does), by user request, or perhaps original sources will appear as ads?

If these are true, and I believe they are, then there may be a new system to game: being the first to provide authoritative content to GenAI engines in hopes they drive traffic to your source.

The Internet is already being littered with talk of Generative Engine Optimization (GEO). But GEO thus far is focused on how to optimize content structure for summarization and reference by GenAI engines. Step #1 in GEO how-tos is “Do all of the SEO things.” This is surely important in 2024, but where’s the puck going?

Crude content discovery landscape A napkin drawing.

If there are predictions to be made:

  • We’ll see capitulation in the game of GenAI keep-away.1
  • GenAI engines will be increasingly connected and increasingly quick to provide results that blend the best of the models with the best results from the web.
  • The new game will be creating content and then feeding it immediately and in specific ways to GenAI engines (effectively “GenAI content stuffing”), resulting in a new way of thinking about “authority” or “rank” that increasingly cuts out traditional search engines.

Search engines aren’t going away overnight, or even in the next decade. But they’re a costly means to an end for both creators, who have to optimize for them, and GenAI platforms, which have to wait on them, potentially license results from them, and generally depend on them. For both of these constituencies, the search engine becomes both a boat anchor and a tax, and both will be increasingly motivated to cut them out of the equation.

  1. This is a big deal because there’s a considerable effort today to block GenAI crawlers from accessing content. Large publishers might succeed in this, but smaller sites and creators probably won’t. Like water, GenAI crawlers will find a way. 

After years observing cybersecurity defenders balance cost and effectiveness, focus has oscillated solely between device (e.g., endpoint) and identity (e.g., directory or identity provider) security.

While other technology and security domains contribute to overall maturity, their contributions are incremental and a distant third by comparison.

Put another way: If you aren’t exceptional at both device and identity security, being exceptional in any or all other areas won’t help.

September 30, 2024

Cybersecurity predictions, Q3 2024 edition

Anyone who’s worked in cybersecurity for a meaningful amount of time has been asked to make predictions. Here are two predictions I’ve made that have endured over the past 2-3 years.

MSSP, MDR, and all other defensive managed services coalesce into SOC-as-a-Service.

All of the historical managed security service market categories are dancing around this eventuality.

The simplest way to think about a security operations center, or SOC, is as a vertically integrated set of functions whose primary reason for being is to detect and respond to cybersecurity threats. And Incident Management 101 tells us that in order to be effective at detection and response, you need to understand where your incidents come from (threat vectors) and corresponding attack paths (root causes—there is rarely just one).

Traditional managed security services focus almost exclusively on detection and response. This is great, as it’s exceptionally difficult to be exceptional at detection and response. But the smart money is on maximizing what you learn from your incidents and making decisive changes to eliminate threat vectors, reduce attack surface, and ultimately to significantly reduce the cost to adversaries who want to meddle in your environment.

It’s overwhelming to think about the dozens of cybersecurity market categories that one could Frankenstein together as bookends for detection and response to build a highly effective SOC. Is it attack surface management (ASM), vulnerability management (VM), exposure management, cloud security posture management (CSPM), incident response (IR)? Very few organizations need any one of these functions turned up to 11. What most organizations need is a subset of these functions working exceptionally well together.

So again, the destination here is a vertically integrated and highly operationally-focused slice of some of these functions, not just detecting and responding to threats, but systematically working to reduce the number and severity of incidents, effectively and at scale.

The browser becomes the most important device of all.

We love to say things like “identity is the new perimeter.” This statement may be accurate, and helpful for convincing people that they should invest heavily in protecting identity. But, it can lead to superficial implementations that fixate on the identity provider, or IdP, and leave us with gaps in our understanding of what’s being done to circumvent obvious points of identity protection, and what happens once a session exists.

Trust is increasingly pinned to the browser. It’s how we access the IdP in the first place, and it’s where any modern organization does 99% of their work after authenticating via the IdP.

Application control, including application whitelisting or allow lists and behavioral controls, are hallmarks of great endpoint protection. These are available for the browser, but are clunky to manage and far from mainstream in the enterprise. Ad and script blockers are absolutely critical to protecting end users from all manner of shenanigans, but both of these spaces feel like moving targets—purveyors of the most popular browsers don’t like them and these interdictions can be at odds with interoperability and ease-of-use.

Soon, we’ll see widespread acceptance of the importance of the browser, and it’ll become commonplace to instrument and think about browser protection, integrity, and observability in the same ways that we’ve come to think about these concepts on traditional end user platforms, like macOS and Windows (Linux, too, I know there are dozens of you!).

Multicloud, and multiSIEM

tl;dr

  • Many companies of even modest size are multicloud
  • Every major cloud platform and security vendor has a SIEM or is building one
  • Because of this, an increasing percentage of companies are now “multiSIEM”
  • MultiSIEM is more of a situation we find ourselves in than a strategy
  • All SIEM-alike solutions—from narrower log aggregators like XDR, to traditional SIEMs, to newer data lake solutions that better balance scale, cost, and analytical capabilities—are all on a collision course

We are all multicloud

Early on, it used to be the case that we talked about multicloud like “build the same app and deploy it to multiple providers.” In practice, virtually no one does this. The reality is more like:

  • We use Google Workspace for email and productivity, and AWS for production.
  • We use Microsoft 365 for email and productivity, and Google Cloud for production.
  • We use AWS WorkMail for . . . kidding, no one uses the AWS productivity stuff outside of AWS
  • Some variant of the above

And an obviously common pattern specific to Microsoft is the Hybrid Cloud, which is a mix of Microsoft cloud and on-premise services.

We aren’t reimplementing the same solution across multiple cloud providers. We’re using the right cloud provider for the right job.

A consequence of this multicloud pattern for cybersecurity practitioners is that we’re often dealing with some overlapping data types (e.g., identity) across cloud providers, but then we’re also dealing with many unique, service-specific data types.

Multicloud? MultiSIEM.

Crude SIEM landscape Above graphic is illustrative, not comprehensive.

A good number of organizations already identified a need to collect data from a wide variety of systems, and so they’ve adopted one of the handful of pure-play SIEM solutions:

  • Splunk
  • QRadar
  • Elastic Security
  • LogRhythm

So, that’s one SIEM.

Then, over time, every major cloud platform provider (and, most major enterprise technology providers) established the pattern of building or acquiring a SIEM, sending all of their native data sources into it, and of course supporting just enough third-party services to make it appealing to some pure-play SIEM customers:

For many organizations, that’s at least two SIEM solutions.

If you already have a pure play SIEM + Microsoft 365 + AWS for production, you might have three SIEMs now, which makes you a true collector.

This is to say nothing of the various log aggregators, like Amazon’s CloudTrail, Security Lake, and numerous other SIEM-adjacent external security services, Google’s Cloud Logging for basic log aggregation, and other solutions like them. These aren’t SIEM per se, but are yet another consideration when it comes to data flows, and thus security investigation workflows.

The cats and dogs of infrastructure

For organizations with on-premise, remote access, or data center infrastructure, they also get to figure out what to do with the massive amount of data generated by these technologies:

  • Palo Alto’s XSIAM
  • Cisco’s Security Analytics and Logging
  • Fortinet’s FortiSIEM

At the end of the day, the unluckiest of organizations have a few or more SIEM-like solutions, depending on how liberally you count.

Don’t forget about XDR

Then, we have XDR platforms, most of which are EDR or EPP solutions that have expanded the scope of data that they can consume, and so “endpoint” is no longer a useful distinction (thus, “extended”).

And this is to say nothing of MDR and other service providers, some of which have their own platform that functions as a SIEM. In these cases, it’s common for selected features to be exposed only to the vendor’s team, while a subset of features are exposed to customers.

From an evolutionary standpoint, XDR platforms are most directly on a collision course with SIEM, which tracks with major platform providers’ roadmaps.

Security platforms and log aggregators on a collision course

Most organizations large, mature, or crazy enough to own a traditional SIEM now have at least two, perhaps more solutions that provide increasingly overlapping functionality. And they have them not because it’s their desire of strategy, but because these various points of aggregation exist and oftentimes the provider’s logs end up in them by default.

There’s a logical desire to collect defensive telemetry and other log data in the fewest number of places, to optimize not only for cost but also for our attention, as we then need to build analytics, perform investigations, and respond to threats. But there’s also a cost to getting data from one place to another, and as we push for more observability, it can quickly run counter to consolidation (e.g., moving all of your Amazon logs to Microsoft or vice versa can incur massive transit costs).

It’s hard to see how this can consolidate in the traditional sense, and the most likely path is that expansive enterprise security teams, including outsourced security operations providers, become adept at minimizing overlap amongst data flows (read: transit and storage costs) and analytics. We’ll figure out how to aggregate the bulk of a given provider’s logs in their native log aggregator, and siphon off a subset to data lakes and/or SIEM platforms based on a combination of cost factors and operational use cases.

Much like we’ve learned to live with multicloud, we’ll learn how to optimize and live with multiSIEM.

Breaches are more like plane crashes than car crashes

The Swiss Cheese Model

This was taken verbatim from LinkedIn post that I wrote in a fit of near-rage after seeing my 1000th vendor claim that human error causes most breaches. There’s more to say about this, and a more constructive way to say it, but perhaps another time.

Saying “most breaches are due to human error” is tired, ill-informed, unhelpful framing. If true, only on a technicality, and wildly misleading.

People make mistakes every day. Very, very few of the mistakes people make that are associated with security incidents are rare, unknown, or exceptional failure modes. They’re mostly things we know and expect to happen, but that we haven’t taken enough care to prevent (including not adding sufficient friction, where we can’t prevent the thing outright).

Got ransomware? Was the initial access vector email, specifically phishing? Was the root cause the user opening the email and falling for the lure? No way. Never. The root cause was the ability to run arbitrary code or software on a system, instead of using application control (mostly free, but admittedly free like a puppy, not like beer). Or maybe the root cause was lack of MFA or a lesser MFA implementation that isn’t sufficiently phishing resistant.

Breaches are more like plane crashes than car crashes: They don’t happen in an instant. A number of things have to go wrong, and a number of opportunities to avoid or lessen the impact have to be missed.

Implying that one of these things is primarily to blame mischaracterizes the problem, and shows a general lack both understanding of how breaches occur, and of basic systems thinking.