Member-only story

Structured data lake folders in Synapse + SQL pool

Park Sehun
3 min readFeb 18, 2023

I would start with the reason why we need the data transformation.

When you’re setting up the data lake, and data warehouse, you will have multiple stages of data formation. In other words, you are not going to use the raw data or the data initially stored in the data warehouse.

The most common concept used in the data world is the bronze/silver/gold schema. Bronze is nearly raw ingestion, silver is the first filtered and cleaned data, and lastly gold is the business-level aggregated data like production-ready data to be visualized in the BI tools.

In synapse + spark pool, you can actually make this structure, while it may be a bit difficult when you are using SQL pool because you can’t create the file structure based on the date/time sliced. This means your raw data may have the directory tree like below, but after processing your bronze schema transformed to the silver data, you can’t make a similar structure as per the limitation from the SQL pool.

Therefore, I will introduce some tricks to make the silver/gold data lake with years/month level. Those sample codes are not runnable so you need to understand the…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

No responses yet

Write a response