Block Scripts, Styles, Media, and Ads with Playwright

Playwright is one of the most powerful tools any developer can learn, transforming the browser itself into a platform for feature development, data aggregation, image and pdf generation, app testing, and process automation.

But with great power comes great responsibility. By default, the browsers that Playwright spins up behave exactly like a real browser, downloading and loading all of the scripts, styles, media, and third party resources (e.g. ads) that a normal browser would.

This behavior is great in most cases, but when it comes time to ship your code to production, you’ll probably start wondering how to curtail some of these excesses, whether in service of performance or stability. After all, the less assets the browser loads, the few problems you’re likely to have!

Blocking Requested Resources

More likely than not, your automation’s initial page request won’t be a bottleneck on performance. After all, most websites don’t directly embed the scripts, styles, images, and other assets it depends on. Instead, this data is requested after the initial page load, sometimes in a seemingly endless waterfall of traffic (as seen in your browser devtool’s “Network” tab).

Not to worry! Playwright offers a truly excellent SDK for sniffing out problematic requests and blocking them before they load.

Block Requests for a Single Page

Playwright’s Page class provides a nifty method for monitoring traffic. If you’re familiar with API frameworks like Express, Hono, or FastAPI, this pattern will be very familiar. I think of it as a “reverse router” (docs):

import * as pw from 'playwright';

const browser = await pw.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();

// watch traffic matching a pattern
await page.route('**/*.css', (route) => {
  // and abort the request
  route.abort();
});

Notice that the .route() API supports glob syntax, specifically **, *, ?, {this,that}, [a-z]. Here are some valid patterns:

**/*.{ttf,woff,woff2}
/ranges/*[a-c]*[xyz]*
/singular/wildcard?

Also note that where you initialize your .route() matters. In other words, it’s possible to launch an initial page and all resources in full, then block future requests given a particular pattern. Similarly, you can remove route handlers after-the-fact using .unroute() (docs).

await page.goto('/start-here'); // loads page with all gifs
await page.route('**/*.gif', (route) => route.abort());
await page.goto('/next-page'); // no additional gifs load
await page.unroute('**/*.gif');
await page.goto('/another-page'); // back to loading gifs

Block Requests Across All Pages

The previously described how the .route() API functions across the lifespan of a particular Page. This pattern works great when your script runs entirely within a single browser tab.

But what if you want to load multiple tabs simultaneously? Or what if the target website opens links in a new tab by default?

In these cases, rather than setting route handlers on the Page object, you can instead set handlers on the Context object. This goes for .route() (docs) as well as .unroute() (docs). But the syntax is exactly the same…

const browser = await pw.chromium.launch();
const context = await browser.newContext({
  baseURL: 'https://www.browsercat.com',
});

// watch the entire browser context
await context.route('**/*.js', (route) => route.abort());

// no JS loaded anywhere!
const page1 = await context.newPage();
await page1.goto('/');
const page2 = await context.newPage();
await page2.goto('/');

// enable JS on future requests
await context.unroute('**/*.js');

Block Requests by Regex

Glob patterns are convenient, but what if you want to employ the full power of regex? Playwright has you covered:

await page.route(
  // wow, such regular, so expressive 🐕
  /slug-with-(?:specifically-this|prefixed-.*|.*-suffixed)/u, 
  (route) => route.abort(),
);

Block Requests by Domain

Do you want to block all traffic coming from a particular subdomain? Or perhaps an entirely different third-party? Things get slightly more complicated, but not by much:

// watch everything
await page.route('**/*', (route) => {
  // block all traffic from the offending domain
  if (route.request().url().includes('cdn.badbadnotgood.com')) {
    return route.abort();
  }

  // allow all other traffic through
  route.continue();
});

Remember that your handler must take some action on every single Route it filters. Every function call must trigger one of the response methods (.abort(), .continue(), .fulfill(), or .fallback()). Otherwise, the request will hang indefinitely. See the docs for more info.

Block Requests by Content Type

It can be a pain to list every single pattern you want to block. Especially since the web doesn’t require the file extension to match the file type. A site can serve JavaScript as script.jpg or an image as /extensionless/image.

Thankfully, Playwright provides an easy way to block traffic by content type:

await page.route('**/*', (route) => {
  if (route.request().resourceType() === 'image') {
    return route.abort();
  }

  route.continue();
});

As you can see, .resourceType() leverages the browser’s inference engine to guess the type of the request, regardless of the URI (docs).

For example, you might not guess that /unknown/entity is an image, but since the browser knows the request is being made based on the following HTML, it will…

<img src="/unknown/entity" />

This still won’t be a guarantee, but using .resourceType() will be pretty darn effective. And it’ll save you hours digging through your target site’s traffic history.

Block Requests by Arbitrary Logic

The Request object gives you access to every aspect of the underlying request. If find it difficult to target particular traffic, it’s worth exploring the docs for ideas.

Here are some examples:

await page.route('**/*', (route) => {
  const req = route.request();

  // block by method
  if (req.method() === 'DELETE') {
    return route.abort();
  }

  // block by header
  if (req.allHeaders()['X-Source']?.includes('dangerous')) {
    return route.abort();
  }

  // block by body
  if (req.postDataJSON()?.length >= 3) {
    return route.abort();
  }

  route.continue();
});

Mock Responses

Sometimes, you want to prevent the request from being made, but you don’t want the website to know that you blocked it. For instance, perhaps a site throws an undesirable error when a request fails, and this prevents you from continuing the script as planned.

In cases like these, you can mock the response to particular requests, saving network bandwidth while preserving the desired behavior.

Example:

const onePixelPng = Buffer.from(
  'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/wQAnQEB/wl8UgMAAAAASUVORK5CYII=', 
  'base64',
);

await page.route('**/*', (route) => {
  const req = route.request();

  // allow all non-image requests
  if (req.resourceType() !== 'image') {
    return route.continue();
  }

  // resolve all image requests to a single pixel
  await route.fulfill({
    status: 200,
    contentType: 'image/png',
    body: Buffer.from(onePixelPng),
  });
});

Blocking Ads, Trackers, and Third-Party Scripts

There are many use-cases where you still want a website’s scripts, styles, and media to load, but do you ever really want your target website to load its trackers and ads?

Probably not!

Let’s run through your options for quickly and efficiently streamlining your Playwright scripts…

Block Third-Parties by Domain

As mentioned above, you can easily block traffic by domain. And if your target website doesn’t heavily rely on third-party ads and trackers, this is probably your best bet.

Why? It doesn’t require any extra dependencies. That keeps your scripts fast and lean, as well as reducing the number of things that can go wrong.

For reference, here’s the strategy in full:

// watch everything
await page.route('**/*', (route) => {
  // block all traffic from the offending domain
  if (route.request().url().includes('cdn.badbadnotgood.com')) {
    return route.abort();
  }

  // allow all other traffic through
  route.continue();
});

Block Third-Parties the Easy Way

While the previous method is lightweight and easy to grok, there are cases where you need a full-blown ad blocking solution. For example, if you’ve got a web crawler that isn’t restricted to a single domain, there’s no way of knowing up front which third-parties to block.

In these cases, I recommend installing @cliqz/adblocker-playwright.

This (completely free) library is maintained by privacy company Ghostery, and it’s always kept up-to-date.

To get started, install the library:

npm install --save @cliqz/adblocker-playwright

Then enable it on your Page object:

import * as pw from 'playwright';
import {PlaywrightBlocker} from '@cliqz/adblocker-playwright';

const browser = await pw.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();

PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch)
  .then((blocker) => blocker.enableBlockingInPage(page));

That’s it! Now all network traffic will be compared against open-source blacklists of common ad platforms and trackers.

For more info on how to configure the package and optimize its use across runs, read the docs. They’re gloriously brief.

Blocking Inline Resources

While you’re most likely to need to block assets requested after the initial page load, there are some cases where inlined resources are the main performance bottleneck.

In these situations, Playwright provides .addInitScript(). This API allows you to write JavaScript code that runs within the browser before anything else during a navigation request. Just like the .route() API, Playwright supports adding an initialization script to the Page object (docs) or the Context object (docs).

Let’s go through some examples…

Block Inline Scripts

To block inline scripts, we’re going to wait for all of the initial DOM content to be loaded, then filter out any <script> tags…

await page.addInitScript(() => {
  // runs after the DOM is parsed, but before it's loaded
  document.addEventListener('DOMContentLoaded', () => {
    // find all script tags
    const scripts = document.querySelectorAll('script');
    // and delete them
    scripts.forEach((el) => el.remove());
  });
});

Block Inline Styles

We’ll block inline styles from loading in a similar manner…

await page.addInitScript(() => {
  document.addEventListener('DOMContentLoaded', () => {
    // find all style tags and delete them
    const styles = document.querySelectorAll('style');
    styles.forEach((el) => el.remove());
  });
});

Block Other Inline Assets

It’s worth noting that this trick can be used to eliminate huge swathes of the DOM before the browser goes through the trouble of loading it. For example, if you’re writing a crawler, but you only care about a small portion of the DOM, you may benefit from stripping out all supplemental content.

await page.addInitScript(() => {
  document.addEventListener('DOMContentLoaded', () => {
    // find the `<main>` element
    const main = document.querySelector('main');

    if (main) {
      // replace all other page content with `<main>`
      document.body.innerHTML = '';
      document.body.appendChild(main.cloneNode(true));
    }
  });
});

But let the buyer beware! This particular strategy may be a wonderful resource in some situations. But in others, it can actually slow down performance. If attempting to optimize via DOM manipulation, run some tests!

Block Scripts and Styles Added Later

If you’re concerned about additional script or style tags added to the page dynamically after load, you can use the MutationObserver API to watch for late changes:

await page.addInitScript(() => {
  const observer = new MutationObserver((mutations) => {
    mutations.forEach((mutation) => {
      // loop through added nodes
      mutation.addedNodes.forEach((node) => {
        // remove scripts
        if (node.tagName === 'SCRIPT') {
          node.remove();
        }

        // remove styles
        if (node.tagName === 'STYLE') {
          node.remove();
        }
      });
    });
  });

  // watch every DOM node for changes
  observer.observe(document.documentElement, {
    childList: true,
    subtree: true
  });
});

Blanket Performance Gains

To finish off, let’s zip through a few quick, powerful, and sometimes dangerous ways to optimize your playwright automations. These techniques can save big on performance, but they also risk destabilizing your scripts, and in some cases, making them insecure.

Disable All Javascript

If you know for sure your use-cases doesn’t require any on-page JavaScript, the easiest way forward is to simply toggle JavaScript off for the entire browser context.

const browser = pw.chromium.launch();
const context = browser.newContext({javaScriptEnabled: false});

Not only will this save you from the base cost of initializing JavaScript, but it may also prevent the site from:

Hydrating the React/Svelte/Vue app
Launching pop-ups
Loading third-party scripts
Replacing placeholder images with full-size images
Initializing auto-play videos
Triggering a waterfall of subsequent JavaScript requests
Destroying content by enabling a paywall
and on and on…

The flip side of this is that you may actually need some of these behaviors. For example, some sites require JavaScript to populate interactive charts and tables. Others may require JavaScript for handing form submissions and button clicks.

As a final word of caution, disabling JavaScript might trigger a website’s anti-bot defense system. After all, many less-sophisticated web crawlers don’t run JavaScript either.

Nevertheless, if you’re use-case meets these criteria, by all means… shut everything down.

Browser CLI Flags

By default, Playwright tries to ensure its headless browsers behave maximally similar to a user-driven browser. It uses browsers’ available CLI args to accomplish this. (See the logic for Chromium, Firefox, and Webkit.)

However, Playwright doesn’t touch any performance-related flags, as these naturally conflict with aligning headed and headless browser behavior.

But what if your use-case doesn’t leverage every aspect of the browser’s capabilities? If not, it may be worth your time to explore the available CLI args. This is especially true for Chromium, which provides dozens of options, including offering CLI access to feature flags (chrome://flags), giving you tremendous control over your browser processes.

Chromium-Based Browsers

There’s one sure-fire way to improve the performance of Chromium-based browsers, and that’s to disable security sandboxing. But be warned—doing so may expose your execution context (Docker container, Lambda function, etc.) to malicious code.

Even if you aren’t including user-generated data in your scripts, you may still be vulnerable to attack. After all, third-party dependencies may carry code designed to sniff for sensitive data and access. While this possibility is unlikely, it is important to be aware of the consequences.

const browser = pw.chromium.launch({
  args: [
    '--no-sandbox' // Use with caution!
  ],
});

While no other flags come with the same security risk, selecting from the hundreds available is strongly dependent on your use-case. The following resources will help you get started:

Next Steps…

Learning is fun, but now it’s time to experiment!

I recommend starting way back at the beginning of this page. Apply each of the listed strategies to your existing Playwright scripts, and be sure to measure your gains. In my automation work, I’ve reduced scripts that take 10 minutes down to 45 seconds, largely by applying the ideas above.

When you’re happy with the results, consider giving BrowserCat a try. We offer a fleet of headless browsers that scale with your needs for a very affordable price. They’re pre-optimized and fully compatible with Playwright, giving you full latitude in developing awesome functionality without any of the headaches involved in hosting your own browsers.

And we offer a great free plan! Give us a try!