<?xml version="1.0" encoding="utf-8"?> 
<rss version="2.0"
  xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
  xmlns:atom="http://www.w3.org/2005/Atom">

<channel>

<title>Blog — George Mishurovsky: posts tagged book review</title>
<link>https://mishurovsky.com/blog/?go=tags/book-review/</link>
<description>A blog by George Mishurovsky — a senior software engineer with a medical degree. Drawing from both engineering and scientific thinking, he explores software, architecture, design, psychology, and product thinking.</description>
<author></author>
<language>en</language>
<generator>Aegea 11.3 (v4134e)</generator>

<itunes:owner>
<itunes:name></itunes:name>
<itunes:email>george@mishurovsky.com</itunes:email>
</itunes:owner>
<itunes:subtitle>A blog by George Mishurovsky — a senior software engineer with a medical degree. Drawing from both engineering and scientific thinking, he explores software, architecture, design, psychology, and product thinking.</itunes:subtitle>
<itunes:image href="https://mishurovsky.com/blog/pictures/userpic/userpic-square@2x.jpg?1753619610" />
<itunes:explicit>no</itunes:explicit>

<item>
<title>Book review: “Data Pipelines Pocket Reference”, James Densmore</title>
<guid isPermaLink="false">35</guid>
<link>https://mishurovsky.com/blog/?go=all/book-review-data-pipelines-pocket-reference-james-densmore/</link>
<pubDate>Wed, 28 Jan 2026 12:21:58 +0200</pubDate>
<author></author>
<comments>https://mishurovsky.com/blog/?go=all/book-review-data-pipelines-pocket-reference-james-densmore/</comments>
<description>
&lt;div class="e2-text-picture"&gt;
&lt;a href="https://www.oreilly.com/library/view/data-pipelines-pocket/9781492087823/" class="e2-text-picture-link"&gt;
&lt;img src="https://mishurovsky.com/blog/pictures/data-pipelines-pocket-reference@2x.png" width="542" height="891" alt="" /&gt;
&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;Software engineering is all about manipulating data. A big portion of software engineer’s attention is drawn to collection data from users and presenting it back to them in a useful form. However, there is another side of data — the kind that is not produced by software users but only consumed by them. Here, we aim for the goals of achieving a single source of truth, data validity and availability, and enabling performant processing (for analysis or presentation).&lt;/p&gt;
&lt;p&gt;To get a better grasp of the tooling for working with such data, this week I read &lt;i&gt;Data Pipelines Pocket Reference&lt;/i&gt; by James Densmore. The book focuses on the modern ELT (Extract-Load-Transform) approach, as well as EtLT (with ‘t’ for generic non-business-related data transformation).&lt;/p&gt;
&lt;p&gt;It turned out to be a very practical pocket guide indeed. Each section of the book dedicated to data extraction, loading, and transformation is supplied with clear code snippets in Python.  The snippets demonstrate means to connect to essential services, such as databases, AWS S3, Amazon Redshift, Snowflake, Apache Airflow, as well as basics of data manipulation.&lt;/p&gt;
&lt;p&gt;I liked two things. First, these snippets feel production-ready. Surely, they feature no robust logic, but they are sufficient to start moving data around, running validations and applying transformations. Second, the author not only focuses on interaction with services, but&lt;br /&gt;
also provides some tricks of data processing and validation. In particular, there is a neat data testing framework based on separate Python scripts for each check, which can be integrated into Airflow workflows. The approach, while being quite lean, requires a certain mindset to arrive at, so this bit of knowledge was one of the things that saves time and builds a scalable data processing foundation.&lt;/p&gt;
&lt;p&gt;That said, I think this book lacks example that are closer to real practice. It would benefit from a companion GitHub repository with a substantial dataset to run ELT against, in addition to the primitive data samples from the book which take no more than 10 rows and 5 columns in a single SQL table. The book also misses any in-depth discussions, making it a pocket reference, indeed.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;What it covers:&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data roles: data engineering, data analytics, and data science&lt;/li&gt;
&lt;li&gt;Types of pipelines: ETL vs. ELT vs. EtLT&lt;/li&gt;
&lt;li&gt;Overview of tools for each ELT step and their orchestration&lt;/li&gt;
&lt;li&gt;Minimal instructions for setting up data ingestion and transformation&lt;/li&gt;
&lt;li&gt;Approaches to pipeline orchestration&lt;/li&gt;
&lt;li&gt;A framework for data validation&lt;/li&gt;
&lt;li&gt;Building pipelines with monitoring and maintenance in mind&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Verdict: 4 / 5 — a good reference to start building simple ELT pipelines in a day, which is likely exactly what a general software engineer would want if data engineering is not their primary area of specialization&lt;/b&gt;&lt;/p&gt;
</description>
</item>

<item>
<title>Book review: “Web Scraping with Python”, Ryan Mitchell</title>
<guid isPermaLink="false">32</guid>
<link>https://mishurovsky.com/blog/?go=all/book-review-web-scraping-with-python-ryan-mitchell/</link>
<pubDate>Thu, 22 Jan 2026 16:56:17 +0200</pubDate>
<author></author>
<comments>https://mishurovsky.com/blog/?go=all/book-review-web-scraping-with-python-ryan-mitchell/</comments>
<description>
&lt;div class="e2-text-picture"&gt;
&lt;a href="https://www.oreilly.com/library/view/web-scraping-with/9781098145347/" class="e2-text-picture-link"&gt;
&lt;img src="https://mishurovsky.com/blog/pictures/web-scraping-with-python@2x.jpg" width="381" height="500" alt="" /&gt;
&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;Recently I faced a challenge of designing a web crawling and scraping system. To build context, I started my work by reading &lt;i&gt;Web Scraping with Python: Data Extraction for the Modern Web&lt;/i&gt; by Ryan Mitchell (3rd edition, revised in 2024).&lt;/p&gt;
&lt;p&gt;The book turned out to be a very pleasant read. The author’s approach is well structured with chapters going from simple practical tasks and legal overview to advanced considerations such as Natural Language Processing and race conditions in distributed scraping systems. Code snippets are concise and useful — I can imagine them being used in a small scale production system. In two days, I managed to build a solid understanding of common approaches, architectures, problems, and solutions in this field.&lt;/p&gt;
&lt;p&gt;That said, the content is not without its flaws. The section on JavaScript and SSR section is so outdated it is almost hilarious. Mentions of Dynamic HTML, jQuery and AJAX calls are appropriate for a book written around 2010, but not for a revised version from 2024. Nonetheless, even this section is useful at a conceptual level: modern SPAs achieve the same goals as early 2000s web applications that generated dynamic HTML server-side and sent it to browsers.&lt;/p&gt;
&lt;p&gt;The ease with which I read this book was strongly influenced by my existing knowledge of the web. Over the years, I have built a solid foundation in HTML and CSS, JavaScript and Python, APIs, application architecture, and networking — all of which helped me clearly see the connections between the author’s ideas. However, the book should still be accessible to any technical reader, thanks to its clear explanations and practical code examples.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;What it covers:&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Principles of web technologies&lt;/li&gt;
&lt;li&gt;Legal and ethical considerations&lt;/li&gt;
&lt;li&gt;Common scraping use cases&lt;/li&gt;
&lt;li&gt;Building web crawlers and scrapers&lt;/li&gt;
&lt;li&gt;Crawling strategies&lt;/li&gt;
&lt;li&gt;Transformation and validation of collected data&lt;/li&gt;
&lt;li&gt;Parsing text and image documents&lt;/li&gt;
&lt;li&gt;Scraping traps&lt;/li&gt;
&lt;li&gt;Distributed scraping systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Verdict: 4.5 / 5 — a go-to practical guide for those planning to build their own scraping system.&lt;/b&gt;&lt;/p&gt;
</description>
</item>


</channel>
</rss>