The Global Media Business Weekly

Why we must fight for new rules not against tech

“Despicable people.” That’s how Justine Bateman describes the tech leaders behind today’s Gen AI revolution. (You can read her full quote here). Her outrage is shared by a growing chorus of publishers, journalists and artists who see their work being scraped, repurposed and monetised by AI companies with little regard for consent or compensation. 

While B2B publishers may not be in the spotlight of this topic for now; the scale of AI-driven content theft is bigger than you might think (more on that to come later in this article); there is some urgency for publishers to rethink and to advocate for a system where innovation doesn’t come at the expense of creators.

Modern copyright law began with a simple, revolutionary idea: creators deserve to control – and profit from – their work. The Statute of Anne, enacted in Great Britain in 1710, was the world’s first copyright law, granting authors exclusive rights for a fixed term. Before then, the Stationers’ Company – the historic London venue of our Monetising B2B conference on 20 May – held a monopoly, stifling both competition and creativity. The Statute of Anne broke that grip, introducing the concept of copyright as a tool to encourage learning and reward innovation.

Three centuries later, the technology has changed, but the fear is eerily familiar. Back then, it was printing presses. In the 1990s, it was the internet. Now, it’s AI-specifically, the relentless bots scraping publisher content to feed the data-hungry engines of generative AI.

The scale of the scraping problem

If you think this is a niche concern, think again. Exclusive research by Miso.ai, shared at a recent FIPP x PPA webinar in the UK, revealed just how rampant AI-powered scraping has become. 

They inspected over 2,700 publisher sites that had set up robots.txt “Disallow” commands (the digital equivalent of a “No Trespassing” sign) against AI scrapers. Collectively, these publishers were trying to block more than 1,300 unique bots – far more than most expected.

But here’s the kicker:

  • The average publisher is only targeting about 15 bots each.
  • Just 15% of publishers are actively blocking Google Extended, the tool that lets sites opt out of having their content used for training Google’s AI models.
  • Even when publishers explicitly say “no” to bots like Perplexity, 15-20% still see their article content show up in the Perplexity chatbot. For homepages, that figure jumps to 65-70%, and for images, it’s as high as 66%.

In short: the bots are not listening. The old tools aren’t working.

A global battle for intellectual property

Hundreds of US news organisations have joined the News Media Alliance’s “Support Responsible AI” campaign, calling on lawmakers to make AI developers pay for the content used to train generative models. “Stealing is un-American. Tell Washington to make Big Tech pay for the content it takes,” reads the campaign’s message. 

Danielle Coffey, president and CEO of the News Media Alliance, put it bluntly: “Big Tech and AI companies are using publishers’ own content against them, taking it without authorisation or compensation to power AI products that pull advertising and subscription revenue away from the original creators.”

In the UK, publishers have rallied behind a similar “Make it Fair” campaign and investigative journalist Carole Cadwalladr has accused OpenAI’s Sam Altman of outright data theft, calling the unauthorised use of her work “more than theft. It’s a violation”.

Why B2B publishers should care

For B2B publishers, the issue of data theft doesn’t just represent a hit to the bottom line; it has the potential to be an existential threat. Unlike consumer media, B2B publishing is built on the value of proprietary, high-quality information. When AI scrapers harvest that content without permission, they’re not just stealing traffic – they’re siphoning off the very product that subscribers pay for. Worse, as AI models get better at summarising, paraphrasing and “remixing” this content, it becomes harder to trace the original source. Publishers risk losing both credit and compensation, while AI companies profit from the data.

The legal grey zone

Why is this happening? Because copyright law, designed for a world of books and newspapers, is struggling to keep up with the realities of AI. The law still revolves around the idea of a human author. But AI models don’t “copy” in the traditional sense-they ingest and learn from billions of data points, then generate new content based on patterns.

Recent lawsuits are forcing courts to wrestle with questions such as:

  • Does using copyrighted material to train an AI model count as infringement?
  • Is it “fair use” if the model doesn’t reproduce the original work verbatim?
  • Who is the “author” when a machine creates something new?

So far, the answers are muddy. In the US, the Copyright Office has rejected copyright claims for AI-generated works unless there’s clear human authorship. In Canada, courts have ruled that scraping can constitute infringement, but enforcement is patchy at best.

The futility of Robots.txt – and what comes next

Most publishers start with robots.txt and meta tags, telling bots what they can and can’t access. But as the stats above show, many AI scrapers simply ignore these signals. Even the best-intentioned protocols are toothless if the other side isn’t playing by the rules.

That’s why publishers are turning to a mix of technical and legal tactics:

  • Explicit Terms of Service: Make it crystal clear that scraping and AI training are forbidden. This provides a legal basis for action, even if it doesn’t stop the bots.
  • Copyright Notices: Mark every page and asset. It’s not a shield, but it helps in disputes.
  • Honeypots and CAPTCHAs: Trap and block unsanctioned bots.
  • Advanced Bot Management: Companies like HUMAN use machine learning to detect and block unwanted AI scrapers before they even make a request.
  • Watermarking: For images and videos, watermarks can deter unauthorised use and prove provenance.

Yet the reality is these are speed bumps, not roadblocks.

The real solution: Licensing and new infrastructure

If history teaches us anything, it’s that the answer isn’t to fight technology – it’s to adapt the rules of engagement. The Statute of Anne didn’t ban printing presses; it made them work for authors. The internet didn’t kill copyright; it forced a rethink.

With AI, the best hope is building a system where creators get paid and AI companies get access-legally, transparently and at scale.

Licensing is the way forward.
AI companies need vast, high-quality data. Publishers have it. The missing link? Infrastructure that makes licensing simple, enforceable and valuable for both sides.

New platforms are emerging to fill this gap. Startups like Tollbit, Human Native AI and Story Protocol are examples of marketplaces where publishers can upload, tag and license their content for AI training-on their terms. These platforms offer granular control: publishers decide what’s open or closed, set prices, and track usage. Blockchain technology is being explored to add transparency and traceability to licensing deals.

The trust issue

Of course, publishers are wary of yet more tech vendors. Many of us were burned by the adtech gold rush, where intermediaries pocketed the profits. But the alternative – doing nothing – means watching your unique content become free training data for the next wave of AI disruptors. 

What should B2B publishers do now?

  • Audit your defences: Update robots.txt, terms of service, and copyright notices.
  • Invest in bot management: Don’t rely on basic tools-deploy advanced detection and blocking.
  • Prepare your archive: Tag and organise your content so it’s ready for licensing.
  • Explore licensing platforms: Start conversations with emerging marketplaces.
  • Advocate for better law: Join industry efforts to push for copyright reform that recognises the realities of AI.

The bottom line

Every time technology slashes the cost of copying or creating content, we hit a moment of friction. The Statute of Anne, the internet, and now AI-each era has forced us to ask: how do we protect creators without stifling innovation? It’s not a new problem, but it is a new era. The tools have changed, but the tension remains. Now, it’s time for publishers, lawmakers, and technologists to build a system that reflects the world we live in today – where AI is here to stay, but creators still get their due.

Licensing isn’t just the best answer. It’s the only one that scales. And for B2B publishers, it’s the path to turning a threat into a new revenue stream – before the bots eat your lunch.

Paul Hood is chairing a major panel discussion on AI at the Flashes & Flames’ Monetising B2B conference in London on 20 May. To get full details – and perhaps the last opportunity to book tickets – please click the banner at the top of this page.