How VisionScrape Works
An AI-powered web scraper that uses computer vision and large language models to understand and extract data from any webpage visually.
System Architecture
User Goal
The entry point where users define their scraping objective
Browser Runner
Orchestrates browser automation and session management
Agent Runner
Session Manager
Playwright
Browser Engine
Screenshot
Perception Layer
Computer vision analysis of webpage screenshots
YOLO Perception
YOLOv8 Model
Tesseract OCR
Text Extraction
UI Elements
Reasoning Layer
LLM-powered decision making and action planning
Prompt Builder
Azure OpenAI
GPT-4
Action Plan (JSON)
Execution Layer
Executes planned actions on the browser
Action Executor
Browser Control
Loops back to Browser Runner for continuous operation