Sitemap

The plight of modern screen scrapers

1 min readApr 23, 2021

The bane of automating single-page apps is, ironically, good frontend error handling. Our CodeceptJS-based screen scraper has no idea when a REST error was handled. The frontend team never thought to add a common class to button elements having text like “An error occurred. Please try again.”

Even worse: auto retry. The scraper is unable to distinguish between “no data available” versus a failure. Scraping jobs that run for hours need to pause only as long as necessary. Is an element missing because there is no data, or did an error occur, or did the scraper not wait long enough? It’s impossible to know.

Like everything, you don’t learn this until you try it yourself. If you’re embarking on a new frontend project, I hope you will consider the plight of those who write scrapers and integration tests.

And then there’s infinite scrolling.

--

--

Terris Linenbach
Terris Linenbach

Written by Terris Linenbach

Coder since 1980. Always seeking the Best Way. CV: https://terris.com/cv

No responses yet