Member-only story

Structured data lake folders in Synapse + SQL pool

3 min readFeb 18, 2023

I would start with the reason why we need the data transformation.

When you’re setting up the data lake, and data warehouse, you will have multiple stages of data formation. In other words, you are not going to use the raw data or the data initially stored in the data warehouse.

The most common concept used in the data world is the bronze/silver/gold schema. Bronze is nearly raw ingestion, silver is the first filtered and cleaned data, and lastly gold is the business-level aggregated data like production-ready data to be visualized in the BI tools.

In synapse + spark pool, you can actually make this structure, while it may be a bit difficult when you are using SQL pool because you can’t create the file structure based on the date/time sliced. This means your raw data may have the directory tree like below, but after processing your bronze schema transformed to the silver data, you can’t make a similar structure as per the limitation from the SQL pool.

Therefore, I will introduce some tricks to make the silver/gold data lake with years/month level. Those sample codes are not runnable so you need to understand the…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in

Written by Park Sehun

https://www.linkedin.com/in/park-sehun-1097b140

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Medallion Architecture: Principles and Practical Exploration

In

Level Up Coding

by

Santosh Shinde

Medallion Architecture: Principles and Practical Exploration

Data Layout Approach: A Modern Approach to Scalable Data Lakehouse Design and Understanding with Databricks notebook

Feb 15

SQL Window Functions with Examples -Part 1

Shaloo Mathew

SQL Window Functions with Examples -Part 1

This ultimate guide is a complete overview of the types of SQL window functions, their syntax and real-life examples of how to use them in…

Oct 1, 2024

Lists

ChatGPT

21 stories991 saves

Natural Language Processing

1981 stories1621 saves

When to Use COUNT(*) vs COUNT(1) in SQL Queries

Vijay Gadhave

When to Use COUNT(*) vs COUNT(1) in SQL Queries

Note: If you’re not a medium member, CLICK HERE

Jan 14

How to setup an SFTP server on Ubuntu/Linux

Hariharan

How to setup an SFTP server on Ubuntu/Linux

We are going to set up an SFTP server on Ubuntu18.04, using OpenSSH. By default, Ubuntu Desktop and lightweight Ubuntu server come without…

Sep 24, 2024

Row Level Security(RLS) with Unity Catalog in Databricks

Nidhi Gupta

Row Level Security(RLS) with Unity Catalog in Databricks

Row-level security (RLS) with Unity Catalog is a powerful feature designed to enhance data governance and security in a multi-tenant…

Jan 24

Enterprise Data Architecture 101: AWS+Snowflake Blueprints

Hugo Lu

Enterprise Data Architecture 101: AWS+Snowflake Blueprints

A framework for understanding Enterprise Data Architecture on AWS in Snowflake for 2024

Sep 27, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams