Azure Data Factory Data Flow vs SQL Stored Procedure Comparison

Recently I wrote about using Azure Data Factor (ADF) pipelines to migration test data into a target Dynamics 365 Customer Engagement environment (e.g. here). In this post I used SQL Stored Procedure to transform data. Someone asked me recently when I did not consider no-code ADF dataflow instead? This post explains why….

Let us assume we would like to migrate 5 test Dynamics 365 Customer Engagement (D365) cases (incident entity) from a Source SQL database table.

These correspond to the following attributes in a D365 Case record

Similar to some of my earlier posts on ADF (e.g. here), in order for us to migrate these records, we need to resolve the reference values:

  • For lookups, we need the underlying D365 record unique identifier (D365 Guids)
  • For Optionsets, we need the underlying D365 optionset integer values from D365

Once these reference values are retrieved and stored in the SQL tables, we can then build a new staging table for Cases by left joining the source table with these reference tables. The result will look like this:

Note that I introduced an additional column named “CustmerEntityReferenceType” to hold the entity type for the Customer multi-entity lookup. Once you have a staging table in this format, the upsert into D365 Cases using the ADF “Copy Activity” becomes straight forward, see here for details.

When the pipeline is executed, the 5 test cases will be migrated to D365.

But how did I generate the staging table in the first place? We will need to left join source table to several D365 reference tables to provide the staging table above. This can be done in at least two ways:

  • Write a SQL Stored Procedure then call the Stored Procedure using a pipeline activity, or,
  • Use ADF Data Flow

Generate the Staging Table using SQL Stored Procedure

If you’re using SQL Stored Procedure, you simply generate the Stored Procedure in SQL first. Then you call this Stored Procedure in ADF using the “Stored Procedure Activity” in the ADF Pipeline. This populates the staging table in the SQL database, before you use the staging table as a new source to migrate data into D365 in another “Copy Activity”.

Generate the Staging Table using ADF DataFlow

Another way to generate the staging table is via the ADF dataflow. This is a no-code version and you basically do each of the left joins from Source table to D365 reference table as a series of transformation tasks until you are ready to sink the transformed dataset into the SQL staging table.

In each dataflow task you specify the left stream (literally the name of the task to the LEFT 😊), right stream (the dataset you wish to join with), Join Type (for this example it’s the left join), and Join condition.

Once all the left joins are done, I added a new task to create the additional column “CustomerEntityReferenceType”.

The “select” task allows me to pick only those columns which we need to populate the staging table.

Finally, we load the transformed dataset to the SQL Staging table in the “Sink” task of the data flow.

In the “Settings” tab we truncate the staging SQL table and allow insert only.

In “mapping” tab we specify how the dataset is to be mapped to the staging table columns.

Once you have created and published the data flow in ADF, you can now execute this via a ADF pipeline via the “Data Flow” activity.

Compare SQL Stored Procedures with ADF Data Flow

Both methods mentioned above populates the SQL Staging table via the ADF pipeline activities. However, the crucial difference is the execution time.

In ADF you can view previous execution of the pipelines and the length of time taken. In this test scenario, the pipeline using SQL Stored Procedure took 22 seconds to complete (including load to D365), while the pipeline using the ADF data flow took almost 6 minutes to complete.

Going into each execution and looking at the Gantt Chart of the pipeline activities you can see that by comparison, Stored Procedure took 2 seconds to complete, but the data flow activity took 5 mins 36 seconds.

This has a direct impact on the cost of running the ADF pipelines. Using the pricing available in Microsoft website here,  at the time of writing (July 2020) the 2 pipelines (one with SQ Stored Procedure, and one with ADF DataFlows) can be estimated as follows, with the assumptions:

  • Execution of both pipelines only require Azure Integration Runtime.
  • Each operation uses by default 4 Data-Integration Unit (DIU) in pipeline activities
  • Data Flow execution is assumed to be on General Purpose 16 vCore, with debugging of data flow costing the same time as the real execution.
 ADF Pipeline with SQL Stored ProcedureADF Pipeline with ADF Data Flow
Create Linked Service3 Read/Write entities (SQL, CDS, KeyVault)3 Read/Write entities (SQL, CDS, KeyVault)
Create Datasets18 read/write entities (9 datasets, 9 linked services)20 read/write entities (10 datasets, 10 linked services)
Create Pipelines9 read/write entities (1 pipeline creation, 8 dataset references)9 read/write entities (1 pipeline, 8 dataset references)
Get Pipelines1 read/write entity1 read/write entity
Run Pipelines5 activity runs5 activity runs
Copy Data Activities(27/60) min execution time * 4 DIU(30/60) min execution time * 4 DIU
Stored Procedure Activities(2/60) min execution timeN/A
Data Flow ActivitiesN/A(336/60) min  * 16 vCore * 2 (debug and execution)
 ADF Pipeline with SQL Stored Procedure – CostADF Pipeline with ADF Data Flow – Cost
Data Factory Operations  
Read/Write Ops31 * £0.00000746 (£0.373 per 50,000 modified/referenced entities)33 * £0.00000746 (£0.373 per 50,000 modified/referenced entities)
Pipeline Orchestration & Execution  
Activity Run5 * £0.000746 (£0.746 per 1000 runs)5 * £0.000746 (£0.746 per 1000 runs)
Data Movement Activities27/60 min * 4 DIU * £0.003117 (£0.187/per DIU-hour of Azure Integration Runtime)30/60 * 4 DIU * £0.003117 (£0.187/per DIU-hour of Azure Integration Runtime)
Pipeline Activity2/60 min * £0.00007 (£0.004/per hour)N/A
Data Flow ActivityN/A336/60 * 2 * 16 * £0.003583 (£0.215 per vCore-hour)
Total Price to migrate 5 test records to D365£0.009574£0.6523

In summary, whilst ADF Data Flow is a no code option that allows you to transfer your dataset before loading to a Sink ADF dataset, you will need to factor in the execution time as well as cost. In a data migration we often move tens of thousands or millions of records from source to D365. The execution time and cost of using ADF Dataflow in my minds makes this no-code option currently out of reach.

3 thoughts on “Azure Data Factory Data Flow vs SQL Stored Procedure Comparison

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s