Member-only story

Several ways to convert Relational Databases to Parquet

Park Sehun
2 min readApr 30, 2023

This is about how to convert Oracle DB to Parquet in general, however, this will be able to be applied to most relational Databases like PostgreSQL and MS SQL.

Export data to the other format and convert to Parquet.

Many database engines support converting their tables to CSV, JSON but perhaps not for Parquet. Then you can convert to CSV and use other tools to convert to Parquet.

  • Export data to CSV or JSON and convert to Parquet: One approach is to export the data from the Oracle database to CSV or JSON format, and then convert the data to Parquet using a tool like Apache Spark or Apache Hive. This approach can be useful if you need to transform the data or perform other operations on it before converting to Parquet.

If you are using Scala in Spark

import org.apache.spark.sql.{DataFrame, SparkSession}

object OracleToParquetConverter {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("OracleToParquetConverter")
.getOrCreate()

// Set the Oracle database connection properties
val jdbcUrl = "jdbc:oracle:thin:@//hostname:port/service_name"
val jdbcUsername = "username"
val jdbcPassword = "password"
val jdbcDriver = "oracle.jdbc.driver.OracleDriver"

// Set the query to extract data from Oracle
val query = "SELECT * FROM mytable"

// Read data from Oracle and create a DataFrame
val df…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

No responses yet

Write a response