Book review: “Web Scraping with Python”, Ryan Mitchell
Recently I faced a challenge of designing a web crawling and scraping system. To build context, I started my work by reading Web Scraping with Python: Data Extraction for the Modern Web by Ryan Mitchell (3rd edition, revised in 2024).
The book turned out to be a very pleasant read. The author’s approach is well structured with chapters going from simple practical tasks and legal overview to advanced considerations such as Natural Language Processing and race conditions in distributed scraping systems. Code snippets are concise and useful — I can imagine them being used in a small scale production system. In two days, I managed to build a solid understanding of common approaches, architectures, problems, and solutions in this field.
That said, the content is not without its flaws. The section on JavaScript and SSR section is so outdated it is almost hilarious. Mentions of Dynamic HTML, jQuery and AJAX calls are appropriate for a book written around 2010, but not for a revised version from 2024. Nonetheless, even this section is useful at a conceptual level: modern SPAs achieve the same goals as early 2000s web applications that generated dynamic HTML server-side and sent it to browsers.
The ease with which I read this book was strongly influenced by my existing knowledge of the web. Over the years, I have built a solid foundation in HTML and CSS, JavaScript and Python, APIs, application architecture, and networking — all of which helped me clearly see the connections between the author’s ideas. However, the book should still be accessible to any technical reader, thanks to its clear explanations and practical code examples.
What it covers:
- Principles of web technologies
- Legal and ethical considerations
- Common scraping use cases
- Building web crawlers and scrapers
- Crawling strategies
- Transformation and validation of collected data
- Parsing text and image documents
- Scraping traps
- Distributed scraping systems
Verdict: 4.5 / 5 — a go-to practical guide for those planning to build their own scraping system.