How VisionScrape Works

An AI-powered web scraper that uses computer vision and large language models to understand and extract data from any webpage visually.

System Architecture

User Goal

The entry point where users define their scraping objective

Browser Runner

Orchestrates browser automation and session management

Agent Runner
Session Manager
Playwright

Browser Engine

Screenshot

Perception Layer

Computer vision analysis of webpage screenshots

YOLO Perception

YOLOv8 Model

Tesseract OCR

Text Extraction

UI Elements

Reasoning Layer

LLM-powered decision making and action planning

Prompt Builder
Azure OpenAI

GPT-4

Action Plan (JSON)

Execution Layer

Executes planned actions on the browser

Action Executor

Browser Control

Loops back to Browser Runner for continuous operation

Contributors

Souvik Nayak

Souvik Nayak

Frontend Developer

Gourab Sen

Gourab Sen

Backend Developer