AI query capture and OpenAI search Permalink

I’ve been really interested in how AI engines will impact traditional content discovery models. A key hypothesis is that content creators will reduce time-to-citation by:

  • Optimizing content for AI engines
  • Seeding content into AI systems

Meanwhile, it feels as though AI engines will race to achieve “query capture,” the flywheel of user sentiment, trends, and interests that fuels first-party innovation, third-party data sales, and advertising.

Perplexity pioneered search with citations, but I suspect the release of ChatGPT search will accelerate this particular land grab:

The search model is a fine-tuned version of GPT-4o, post-trained using novel synthetic data generation techniques, including distilling outputs from OpenAI o1-preview. ChatGPT search leverages third-party search providers, as well as content provided directly by our partners, to provide the information users are looking for.

Interesting to think about how “partners” might expand beyond larger publishers, who exist primarily as a source of training data for the engine, but get the benefit of some amount of prioritized discovery and presentation to users. Maybe smaller publishers and/or individuals will begin seeding AI engines with content as a means of getting discovered?

MarTech and AdTech are the true global surveillance superpowers Permalink

Great reporting by Brian Krebs:

Not long ago, the ability to digitally track someone’s daily movements just by knowing their home address, employer, or place of worship was considered a dangerous power that should remain only within the purview of nation states. But a new lawsuit in a likely constitutional battle over a New Jersey privacy law shows that anyone can now access this capability, thanks to a proliferation of commercial services that hoover up the digital exhaust emitted by widely-used mobile apps and websites.

I’ve always held that 99.9% of us shouldn’t worry about the NSA, but 100% of us should worry about marketing (MarTech) and advertising (AdTech).

The Government™ has tremendous resources, but is also a massive bureaucracy saddled with myriad political, legal, and resource constraints. So, while the national technical means (read: spy tech) exist to hoover up and store limitless amounts of data, what they can practically do to and with that data is subject to some limits. Most notably, it’s not in the intelligence community’s interest to try to look at everyone.

Precisely the opposite is true of marketing and advertising. For every human with a dollar to their name, now or in the future, there is someone who wants to sell them something.

To borrow some intelligence jargon, the “targeting list” is effectively the whole of the developed world, and there is so much unregulated signal that two things are true:

  1. Virtually any entity within the ecosystem can truthfully say things like “we don’t share X with Y” or “we use privacy-preserving consumer identifiers”

  2. Virtually any entity within the ecosystem can piece together enough “anonymized” data to associate a name, place, and much more to any identifier, if they choose

SEO, GEO, and the future of content discovery Permalink

The End of Advertising by Michael Mignano was thought-provoking, but not for the reasons I expected.

Anyone who optimizes for growing their viewership or audience, from independent authors to businesses, is foolish to ignore search engine optimization (SEO). Like it or not, it works, driving content to more potential viewers.

And where there’s a system to game, there’s a marketplace of solutions to help you game it.

He alludes to the fact that generative AI (GenAI) platforms are going to incorporate ads, and I’m sure they’ll dominate the advertising ecosystem over time. The unexpected thought that his article provoked: what will happen to the creator ecosystem, and the millions (billions?) of dollars that have been poured into the SEO industrial complex?

Consider these assertions:

  • We’ll find and consume content increasingly via GenAI products like ChatGPT and Perplexity, rather than traditional search engines.
  • GenAI will sometimes link back to original sources, either by design (as Perplexity does), by user request, or perhaps original sources will appear as ads?

If these are true, and I believe they are, then there may be a new system to game: being the first to provide authoritative content to GenAI engines in hopes they drive traffic to your source.

The Internet is already being littered with talk of Generative Engine Optimization (GEO). But GEO thus far is focused on how to optimize content structure for summarization and reference by GenAI engines. Step #1 in GEO how-tos is “Do all of the SEO things.” This is surely important in 2024, but where’s the puck going?

Crude content discovery landscape A napkin drawing.

If there are predictions to be made:

  • We’ll see capitulation in the game of GenAI keep-away.1
  • GenAI engines will be increasingly connected and increasingly quick to provide results that blend the best of the models with the best results from the web.
  • The new game will be creating content and then feeding it immediately and in specific ways to GenAI engines (effectively “GenAI content stuffing”), resulting in a new way of thinking about “authority” or “rank” that increasingly cuts out traditional search engines.

Search engines aren’t going away overnight, or even in the next decade. But they’re a costly means to an end for both creators, who have to optimize for them, and GenAI platforms, which have to wait on them, potentially license results from them, and generally depend on them. For both of these constituencies, the search engine becomes both a boat anchor and a tax, and both will be increasingly motivated to cut them out of the equation.

  1. This is a big deal because there’s a considerable effort today to block GenAI crawlers from accessing content. Large publishers might succeed in this, but smaller sites and creators probably won’t. Like water, GenAI crawlers will find a way. 

After years observing cybersecurity defenders balance cost and effectiveness, focus has oscillated solely between device (e.g., endpoint) and identity (e.g., directory or identity provider) security.

While other technology and security domains contribute to overall maturity, their contributions are incremental and a distant third by comparison.

Put another way: If you aren’t exceptional at both device and identity security, being exceptional in any or all other areas won’t help.

September 30, 2024