Member-only story
Structured data lake folders in Synapse + SQL pool
I would start with the reason why we need the data transformation.
When you’re setting up the data lake, and data warehouse, you will have multiple stages of data formation. In other words, you are not going to use the raw data or the data initially stored in the data warehouse.

The most common concept used in the data world is the bronze/silver/gold schema. Bronze is nearly raw ingestion, silver is the first filtered and cleaned data, and lastly gold is the business-level aggregated data like production-ready data to be visualized in the BI tools.
In synapse + spark pool, you can actually make this structure, while it may be a bit difficult when you are using SQL pool because you can’t create the file structure based on the date/time sliced. This means your raw data may have the directory tree like below, but after processing your bronze schema transformed to the silver data, you can’t make a similar structure as per the limitation from the SQL pool.
Therefore, I will introduce some tricks to make the silver/gold data lake with years/month level. Those sample codes are not runnable so you need to understand the…