Using HTTP proxy with Puppeteer

Gajus Kuizinas - Jan 30 '20 - - Dev Community

I had requirement to evaluate remote JavaScript using Headless Chrome, but requests had to be routed through an internal proxy and different proxies had to be used for different URLs. A convoluted requirement perhaps, but the last bit describes an important feature that Puppeteer is lacking: switching HTTP proxy for each Page/ Request.

However, it turns out that even if the feature is lacking, it is easy to implement an entirely custom HTTP request/ response handling using Puppeteer. All you need is:

  1. Enable request/ response interception using page.setRequestInterception(true).
  2. Intercept request
  3. Make request using Node.js
  4. Return response to Chrome

This way Chrome itself never makes an outgoing HTTP request and all requests can be handled using Node.js.

The basic functionality is simple to implement:

import puppeteer from 'puppeteer';
import got from 'got';
import HttpProxyAgent from 'http-proxy-agent';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // 1. Enable request/ response interception
  await page.setRequestInterception(true);

  // 2. Intercept request
  page.on('request', async (request) => {
    // 3. Make request using Node.js
    const response = await got(request.url(), {
      // HTTP proxy.
      agent: new HttpProxyAgent('http://127.0.0.1:3000'),
      body: request.postData(),
      headers: request.headers(),
      method: request.method(),
      retry: 0,
      throwHttpErrors: false,
    });

    // 4. Return response to Chrome
    await request.respond({
      body: response.body,
      headers: response.headers,
      status: response.statusCode,
    });
  });

  await page.goto('http://gajus.com');
})();



Enter fullscreen mode Exit fullscreen mode

It gets a bit trickier if you require to support HTTPS, error and cookie handling. However, as of last night, there is a package for that: puppeteer-proxy.

puppeteer-proxy abstracts HTTP proxy handling for Puppeteer, including HTTPS support, error and cookie handling. Using it is simple:

import puppeteer from 'puppeteer';
import {
  createPageProxy,
} from 'puppeteer-proxy';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const pageProxy = createPageProxy({
    page,
  });

  await page.setRequestInterception(true);

  page.on('request', async (request) => {
    await pageProxy.proxyRequest({
      request,
      proxyUrl: 'http://127.0.0.1:3000',
    });
  });

  await page.goto('http://gajus.com');
})();

Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .