Member-only story

Python: PostgreSQL to Parquet

Park Sehun
2 min readJun 24, 2023

To continue to learn about how to convert into parquet, I will talk about PostgreSQL to Parquet, today.

There are many libraries when it comes to conversion to parquet.

  • pyarrow: This library provides a Python API for the functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with pandas, NumPy, and other software in the Python ecosystem. In PyArrow we use Snappy compression by default, but Brotli, Gzip, ZSTD, LZ4, and uncompressed are also supported
  • fastparquet: fastparquet is a Python implementation of the parquet format, aiming to integrate into Python-based big data workflows. It is used implicitly by the projects Dask, Pandas and intake-parquet. Compression available by default: gzip, snappy, brotli, lz4, zstandard, optionally supported lzo.
  • pandas: pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with “relational” or “labelled” data both easy and intuitive. For compression, there are ‘snappy’, ‘gzip’, ‘brotli’, None, default ‘snappy’.

Source code

import pyarrow as pa
import pyarrow.parquet as pq
import fastparquet as fp
import pandas as pd
from sqlalchemy import create_engine

# Define PostgreSQL connection parameters
db_username = ''
db_password = ''
db_hostname = ''
db_port = ''
db_name = ''

# Connect to PostgreSQL database
engine =…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

No responses yet

Write a response