What is Web Scraping? Definition & Examples

What is Web Scraping?

Web scraping is the engine underneath every competitor price monitoring tool. A bot visits a competitor's product page, pulls out the price, the stock status, and any other data the merchant cares about, and feeds it into a database where it can be matched, compared, and acted on.

How it works

At the simplest level, a scraper sends an HTTP request to a URL, parses the returned HTML, and extracts specific fields. In practice, modern e-commerce sites make this much harder than it sounds. JavaScript-rendered pages, anti-bot measures, rate limits, region-locked content, and structural changes to the site all break naive scrapers.

Production-grade scrapers handle:

Rendering dynamic content that loads after the initial page
Rotating IPs and headers to avoid being identified as a bot
Detecting structural changes when a competitor redesigns their page
Region awareness so a US-based scraper does not pull EU pricing by accident
Polite scraping that does not hammer competitor servers

The legal landscape

Scraping legality varies by jurisdiction and by what you scrape. Public product pages without authentication are generally fair game in the US (the hiQ vs LinkedIn ruling broadly affirmed this), but rules differ in the EU under GDPR, in the UK, and under specific platform terms of service. Reading and respecting robots.txt, not bypassing access controls, and not redistributing data verbatim are the baseline good-faith practices.

Why it matters for e-commerce

The quality of competitor monitoring is the quality of the scraping underneath it. A tool that sounds great in a sales demo is useless if its scrapers get blocked half the time, miss intraday price changes, or break every time a competitor updates their site. Reliability and breadth of coverage are what separate enterprise-grade tools from spreadsheet alternatives.

Example: A merchant tracks 14 competitors across 800 products. After three months, they audit the data and find one competitor's prices are stale 60% of the time because anti-bot measures have started blocking scrapes. The merchant did not notice because the dashboard showed last-known prices, not freshness. Good scraping infrastructure surfaces these gaps automatically.