Are We Overusing Python? How One Line of SQL Beat a 13-Hour Data Load
In tech, we often rely on Python to solve problems, especially when it comes to data. It’s a versatile tool but sometimes we overuse it, missing out on more efficient solutions that are already built into our databases. With 20+ years in cloud and database operations, I’ve learned that knowing when to lean on the database instead of Python can make all the difference. Recently, I encountered a perfect example of this lesson: a 13-hour data load that could’ve been completed in just seconds with a single SQL command. Here’s the story of how I recognized the inefficiency, made a change, and what it says about our approach to solving problems in tech. The Problem: A Slow Python-Based Data Load We were working with an RDS MySQL instance, aiming to load anywhere between 100k and 250k rows of data. Initially, a Python script was handling the insertion row by row, using executemany() to batch the inserts. While this approach was faster than inserting each row individually, it was still incredi...