Member-only story

Python: PostgreSQL to Parquet

2 min readJun 24, 2023

To continue to learn about how to convert into parquet, I will talk about PostgreSQL to Parquet, today.

There are many libraries when it comes to conversion to parquet.

pyarrow: This library provides a Python API for the functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with pandas, NumPy, and other software in the Python ecosystem. In PyArrow we use Snappy compression by default, but Brotli, Gzip, ZSTD, LZ4, and uncompressed are also supported
fastparquet: fastparquet is a Python implementation of the parquet format, aiming to integrate into Python-based big data workflows. It is used implicitly by the projects Dask, Pandas and intake-parquet. Compression available by default: gzip, snappy, brotli, lz4, zstandard, optionally supported lzo.
pandas: pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with “relational” or “labelled” data both easy and intuitive. For compression, there are ‘snappy’, ‘gzip’, ‘brotli’, None, default ‘snappy’.

Source code

import pyarrow as pa
import pyarrow.parquet as pq
import fastparquet as fp
import pandas as pd
from sqlalchemy import create_engine

# Define PostgreSQL connection parameters
db_username = ''
db_password = ''
db_hostname = ''
db_port = ''
db_name = ''

# Connect to PostgreSQL database
engine =…

Python: PostgreSQL to Parquet

Source code

Create an account to read the full story.

Written by Park Sehun

No responses yet