This tutorial covers the full reCAPTCHA v2 solving pipeline in Python — from extracting the site key to injecting the token and submitting the form. It applies to both reCAPTCHA v2 checkbox and reCAPTCHA v2 invisible.
For background on how reCAPTCHA v2 works, see the reCAPTCHA Guide.
What You Need
pip install requests playwright
playwright install chromium
You also need an API key from a CAPTCHA solver. All examples use CaptchaAI (https://ocr.captchaai.com) — a 2Captcha-compatible endpoint. Switching to 2Captcha is a one-line change.
Step 1 — Find the reCAPTCHA Site Key
The site key is a static value embedded in the page HTML. It looks like: 6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-
import re
import requests
def extract_recaptcha_sitekey(page_url: str) -> str:
"""Extract reCAPTCHA v2 site key from a page at runtime."""
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
html = requests.get(page_url, headers=headers, timeout=15).text
patterns = [
r'data-sitekey=["\']([0-9A-Za-z_\-]{40,})["\']',
r'"sitekey"\s*:\s*"([0-9A-Za-z_\-]{40,})"',
r'grecaptcha\.(?:execute|render)\s*\(\s*["\']([0-9A-Za-z_\-]{40,})["\']',
]
for p in patterns:
m = re.search(p, html)
if m:
return m.group(1)
raise ValueError(f"reCAPTCHA site key not found on {page_url}")
Why extract at runtime? Site keys change during migrations and differ between staging/production. Hardcoding a site key is a common source of silent failures.
Step 2 — Submit the Task to CaptchaAI
import time
def submit_recaptcha_v2_task(
api_key: str,
page_url: str,
site_key: str,
invisible: bool = False,
) -> str:
"""Submit reCAPTCHA v2 task. Returns the task ID."""
payload = {
"key": api_key,
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": page_url,
"json": 1,
}
if invisible:
payload["invisible"] = 1
r = requests.post("https://ocr.captchaai.com/in.php", data=payload, timeout=30)
r.raise_for_status()
data = r.json()
if data.get("status") != 1:
raise RuntimeError(f"Task submission failed: {data}")
return data["request"] # Task ID
Step 3 — Poll for the Token
def poll_recaptcha_result(api_key: str, task_id: str) -> str:
"""
Poll until the token is ready. Returns the g-recaptcha-response token.
Raises TimeoutError if not resolved within ~120 seconds.
"""
time.sleep(5) # reCAPTCHA v2 typically solves in 8–14s — start polling at 5s
for attempt in range(24):
r = requests.get(
"https://ocr.captchaai.com/res.php",
params={"key": api_key, "action": "get", "id": task_id, "json": 1},
timeout=30,
)
r.raise_for_status()
data = r.json()
if data.get("status") == 1:
return data["request"] # The g-recaptcha-response token
if data.get("request") not in ("CAPCHA_NOT_READY", "CAPTCHA_NOT_READY"):
raise RuntimeError(f"Unexpected response: {data}")
time.sleep(5)
raise TimeoutError("reCAPTCHA v2 solve timed out after ~120 seconds")
def solve_recaptcha_v2(
api_key: str,
page_url: str,
site_key: str,
invisible: bool = False,
) -> str:
"""One-call wrapper: submit + poll → token."""
task_id = submit_recaptcha_v2_task(api_key, page_url, site_key, invisible)
return poll_recaptcha_result(api_key, task_id)
Step 4 — Inject the Token and Submit
With Playwright
from playwright.sync_api import sync_playwright
def submit_form_with_playwright(
page_url: str,
api_key: str,
submit_selector: str = 'button[type="submit"]',
) -> None:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(page_url, wait_until="networkidle")
# Extract site key from live page
site_key = page.get_attribute("[data-sitekey]", "data-sitekey")
if not site_key:
raise ValueError("Site key not found in live page")
# Solve
token = solve_recaptcha_v2(api_key, page_url, site_key)
# Inject into g-recaptcha-response field(s)
page.evaluate(f"""
document.querySelectorAll('[name="g-recaptcha-response"]')
.forEach(el => {{
el.value = '{token}';
el.dispatchEvent(new Event('change', {{bubbles: true}}));
}});
""")
page.click(submit_selector)
page.wait_for_load_state("networkidle")
browser.close()
With requests (API-only, no browser)
For forms that accept a simple POST (no JS validation required after token injection):
def submit_form_via_requests(
form_url: str,
api_key: str,
site_key: str,
page_url: str,
extra_fields: dict,
) -> requests.Response:
token = solve_recaptcha_v2(api_key, page_url, site_key)
payload = {**extra_fields, "g-recaptcha-response": token}
r = requests.post(form_url, data=payload, timeout=30)
return r
Adding Retry Logic
Transient failures are normal. Build retry logic into your pipeline:
def solve_with_retry(
api_key: str,
page_url: str,
site_key: str,
invisible: bool = False,
max_retries: int = 3,
) -> str:
last_error = None
for attempt in range(1, max_retries + 1):
try:
return solve_recaptcha_v2(api_key, page_url, site_key, invisible)
except (RuntimeError, TimeoutError) as e:
last_error = e
print(f"Attempt {attempt}/{max_retries} failed: {e}")
time.sleep(3 * attempt) # Exponential backoff
raise RuntimeError(f"All {max_retries} attempts failed. Last error: {last_error}")
Switching to 2Captcha
Replace the endpoint host. All parameters are identical:
# CaptchaAI
BASE_URL = "https://ocr.captchaai.com"
# 2Captcha
BASE_URL = "https://2captcha.com"
Use BASE_URL as a constant in your code so switching is a one-line change.
Common Errors
| Error | Cause | Fix |
|---|---|---|
ERROR_WRONG_GOOGLEKEY |
Site key invalid | Re-extract from page HTML at runtime |
ERROR_CAPTCHA_UNSOLVABLE |
Solve failed | Retry; check IP reputation; switch solver |
| Token rejected ("invalid-input-response") | Token expired (> 2 min) | Submit within 2 minutes of solving |
ERROR_ZERO_BALANCE |
No credits | Top up account |
StaleElementReferenceException (Selenium) |
Page re-rendered | Re-query submit button after token injection |
Related Guides
- reCAPTCHA Guide — v2, v3, and Enterprise explained with solver comparison
- Best reCAPTCHA v2 Solver — ranked solver comparison with pricing
- How to Solve reCAPTCHA v3 in Python — scored token pattern
- CAPTCHA Solving in Python: Quick Start — 15-minute overview
Production Readiness Notes
Use How to Solve reCAPTCHA v2 in Python as a decision and implementation aid, not just as a one-time reference. The practical test for how to solve recaptcha v2 python is whether the same approach behaves reliably when traffic is messy: rotating sessions, expired tokens, changing widget parameters, intermittent solver delays, and target pages that refresh without warning. For Automation developer, the safest rollout is to start with a narrow fixture, record every submitted task, and compare the solver response with the browser state that finally submits the form. That makes failures explainable instead of mysterious, especially when a target alternates between visible challenges, invisible checks, and server-side verification.
Evaluation Criteria
A how-to should be exercised against staging first, then promoted with feature flags so failed solves can fall back without blocking the entire workflow. For reCAPTCHA work, the most useful scorecard combines technical acceptance with operational cost. A low nominal price is not enough if retries double the real cost per accepted token, and a fast median solve time is not enough if p95 latency stalls the queue. Track these criteria before you standardize the workflow:
- The challenge subtype, sitekey, action, rqdata, blob, captchaId, or page URL used for each task.
- Median and p95 solve time, separated by provider and target domain.
- Accepted-token rate on the target page, not just successful API responses.
- Retry count, timeout count, zero-balance incidents, and invalid-parameter errors.
- The exact browser, proxy region, and user-agent that submitted the solved token.
Rollout Checklist
Before this guidance moves into a production job, build a small acceptance suite around the pages that matter most. Run it with a fixed browser profile, then repeat with the proxy and concurrency settings you expect in production. Keep the first release conservative: bounded polling, clear timeout handling, and a fallback path when the solver cannot return a usable answer. For reCAPTCHA, watch score thresholds, hostname checks, action names, token age, and fallback behavior when Google returns a low-confidence response. That checklist keeps the article useful after the first copy-paste, because the integration is judged by end-to-end completion rather than by whether a code sample returned a string.
Monitoring Signals
Healthy CAPTCHA automation is observable. Log the task id, provider, challenge type, target host, queue time, solve time, final submit status, and normalized error code for every attempt. Review those logs in daily batches at first, then move to alerts once the baseline is stable. Sudden drops usually come from target-side changes: a new sitekey, a changed action name, a stricter hostname check, an added managed challenge, or a proxy pool that no longer matches the expected geography. When you can see those shifts quickly, provider switching becomes a controlled decision instead of a late-night rewrite.
Maintenance Cadence
Revisit the setup whenever the target UI changes, when the solver provider changes task names or pricing, or when benchmark data shows a sustained latency or solve-rate shift. Keep one known-good fixture for each CAPTCHA subtype and rerun it after dependency upgrades, browser updates, and proxy changes. If the article is used for vendor selection, repeat the same fixture across at least two providers before renewing a balance or migrating the whole pipeline. That habit keeps how to solve recaptcha v2 python work aligned with the real target behavior rather than with stale assumptions.