Bypassing CAPTCHAs With Headless Chrome Using Puppeteer


Answer :

Try generating random useragent using this npm package. This usually solves the user agent-based protection.

In puppeteer pages can override browser user agent with page.setUserAgent

var userAgent = require('user-agents'); ... await page.setUserAgent(userAgent.toString()) 

Additionally, you can add these two extra plugins,

puppeteer-extra-plugin-recaptcha - Solves reCAPTCHAs automatically, using a single line of code: page.solveRecaptchas()

NOTE: puppeteer-extra-plugin-recaptcha uses a paid service 2captcha

puppeteer-extra-plugin-stealth - Applies various evasion techniques to make detection of headless puppeteer harder.


Here is a list of things I'm doing to bypass the captchas and similar blockings:

  • Enable stealth mode (via puppeteer-extra-plugin-stealth)
  • Randomize User-agent or Set a valid one (via random-useragent)
  • Randomize Viewport size
  • Skip images/styles/fonts loading for better performance
  • Pass "WebDriver check"
  • Pass "Chrome check"
  • Pass "Notifications check"
  • Pass "Plugins check"
  • Pass "Languages check"

Link to full code is here

    const randomUseragent = require('random-useragent');      //Enable stealth mode     const puppeteer = require('puppeteer-extra')     const StealthPlugin = require('puppeteer-extra-plugin-stealth')     puppeteer.use(StealthPlugin())          const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';          async function createPage (browser,url) {          //Randomize User agent or Set a valid one         const userAgent = randomUseragent.getRandom();         const UA = userAgent || USER_AGENT;         const page = await browser.newPage();          //Randomize viewport size         await page.setViewport({             width: 1920 + Math.floor(Math.random() * 100),             height: 3000 + Math.floor(Math.random() * 100),             deviceScaleFactor: 1,             hasTouch: false,             isLandscape: false,             isMobile: false,         });          await page.setUserAgent(UA);         await page.setJavaScriptEnabled(true);         await page.setDefaultNavigationTimeout(0);          //Skip images/styles/fonts loading for performance         await page.setRequestInterception(true);         page.on('request', (req) => {             if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){                 req.abort();             } else {                 req.continue();             }         });          await page.evaluateOnNewDocument(() => {             // Pass webdriver check             Object.defineProperty(navigator, 'webdriver', {                 get: () => false,             });         });          await page.evaluateOnNewDocument(() => {             // Pass chrome check             window.chrome = {                 runtime: {},                 // etc.             };         });          await page.evaluateOnNewDocument(() => {             //Pass notifications check             const originalQuery = window.navigator.permissions.query;             return window.navigator.permissions.query = (parameters) => (                 parameters.name === 'notifications' ?                     Promise.resolve({ state: Notification.permission }) :                     originalQuery(parameters)             );         });          await page.evaluateOnNewDocument(() => {             // Overwrite the `plugins` property to use a custom getter.             Object.defineProperty(navigator, 'plugins', {                 // This just needs to have `length > 0` for the current test,                 // but we could mock the plugins too if necessary.                 get: () => [1, 2, 3, 4, 5],             });         });          await page.evaluateOnNewDocument(() => {             // Overwrite the `languages` property to use a custom getter.             Object.defineProperty(navigator, 'languages', {                 get: () => ['en-US', 'en'],             });         });          await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );         return page;     }


Have you tried setting the browser agent?

await page.setUserAgent('5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'); 

Comments

Popular posts from this blog

Converting A String To Int In Groovy

"Cannot Create Cache Directory /home//.composer/cache/repo/https---packagist.org/, Or Directory Is Not Writable. Proceeding Without Cache"

Android SDK Location Should Not Contain Whitespace, As This Cause Problems With NDK Tools