Azure Synapse Integrate dataflow with Serverless SQL as source

Move and process data from server less SQL and Save as delta

requirements

  • Azure Account
  • Azure Storage
  • Azure Synapse Analytics workspace
  • Load the NYC taxi data

Steps

  • Log into Azure synapse analytics workspace
  • Go to Develop on the left menu
  • create a new data flow

Data flow

  • Overall data flow
  • First connect to serverless sql using Azure sql database connector
  • Get the ondemand sql name from azure synapse workspace overview page
  • I am using sql authentication to test Managed identity authentication for serverless sql to storage
  • You can also use passthrough authentication, but make sure proper permission is set in the storage account to read the data
  • Select the table name as vwYellowTaxi, in my case it’s a view to call
  • Click browse and see if you can see the data
  • Next drag and drop select
  • Select all columns
  • Next we are going to create parition columns as year and month
  • Drag and drop derived column1 task
  • create new column for year and month
  • Next is save the file as delta into Synapse workspace storage
  • drag sink and select inline
  • Select delta and see the config below
  • Specify the storage folder to store the file
  • Now set the partition columns
  • Commit and save the data flow

Pipeline to run the dataflow

  • Now lets create a pipeline
  • Go to Pipeline and create new pipeline
  • Then drag and drop dataflow activity and select the above dataflow created
  • Set the compute options to 32 + 16 cores
  • Select memory optimized
  • Commit and publish the pipeline
  • Click Add Triger and select Run now
  • Wait for the pipeline to run and complete
  • Click details

--

--

https://balakreshnan.github.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store