Selenium Ultimate Scraper Workflow

This workflow is a powerful web scraping and data extraction automation using Selenium and OpenAI. It allows you to collect structured data from almost any website, whether public or behind a login, while handling anti-bot protections and analyzing scraped pages with AI.

It supports:

Running in a Selenium container with optional proxy configuration.
Scraping with or without authentication (via session cookies).
Automatic screenshot capture and AI-based content extraction.
Handling of blocked pages, errors, and fallback logic.

🚀 Features

Webhook Trigger: Accepts JSON input with subject, domain, target URL, and data fields.

Google Search + Smart URL Extraction: Finds the most relevant page from a given domain using query + AI filtering.

Selenium Browser Control:

Launches and manages Chrome sessions inside a Dockerized Selenium container.
Supports proxy configuration for bypassing restrictions.
Can inject cookies for scraping logged-in pages.

Anti-Bot Evasion: Modifies WebDriver fingerprints to avoid detection.

Dynamic Page Handling: Resizes browser window, refreshes pages, and ensures page load stability.

AI-Powered Data Extraction:

Uses OpenAI GPT-4o / GPT-4o-mini to analyze screenshots and extract structured data.
Extracts multiple attributes (up to 5 custom data points).
Handles cases where no relevant data is found.

Error & Block Handling:

Returns clear JSON responses if the request is blocked, cookies don’t match, or pages crash.
Captures screenshots for debugging when issues occur.

Proxy Debugging: Built-in flow to verify your scraping IP via ip-api.com.

Download Blueprint

🚀 Features

Leave a Comment Cancel reply

Reach out to us for a consultation.