Recently I wrote about using Azure Data Factor (ADF) pipelines to migration test data into a target Dynamics 365 Customer Engagement environment (e.g. here). In this post I used SQL Stored Procedure to transform data. Someone asked me recently when I did not consider no-code ADF dataflow instead? This post explains why….
Let us assume we would like to migrate 5 test Dynamics 365 Customer Engagement (D365) cases (incident entity) from a Source SQL database table.
These correspond to the following attributes in a D365 Case record
Similar to some of my earlier posts on ADF (e.g. here), in order for us to migrate these records, we need to resolve the reference values:
- For lookups, we need the underlying D365 record unique identifier (D365 Guids)
- For Optionsets, we need the underlying D365 optionset integer values from D365
Once these reference values are retrieved and stored in the SQL tables, we can then build a new staging table for Cases by left joining the source table with these reference tables. The result will look like this:
Note that I introduced an additional column named “CustmerEntityReferenceType” to hold the entity type for the Customer multi-entity lookup. Once you have a staging table in this format, the upsert into D365 Cases using the ADF “Copy Activity” becomes straight forward, see here for details.
When the pipeline is executed, the 5 test cases will be migrated to D365.
But how did I generate the staging table in the first place? We will need to left join source table to several D365 reference tables to provide the staging table above. This can be done in at least two ways:
- Write a SQL Stored Procedure then call the Stored Procedure using a pipeline activity, or,
- Use ADF Data Flow
Generate the Staging Table using SQL Stored Procedure
If you’re using SQL Stored Procedure, you simply generate the Stored Procedure in SQL first. Then you call this Stored Procedure in ADF using the “Stored Procedure Activity” in the ADF Pipeline. This populates the staging table in the SQL database, before you use the staging table as a new source to migrate data into D365 in another “Copy Activity”.
Generate the Staging Table using ADF DataFlow
Another way to generate the staging table is via the ADF dataflow. This is a no-code version and you basically do each of the left joins from Source table to D365 reference table as a series of transformation tasks until you are ready to sink the transformed dataset into the SQL staging table.
In each dataflow task you specify the left stream (literally the name of the task to the LEFT 😊), right stream (the dataset you wish to join with), Join Type (for this example it’s the left join), and Join condition.
Once all the left joins are done, I added a new task to create the additional column “CustomerEntityReferenceType”.
The “select” task allows me to pick only those columns which we need to populate the staging table.
Finally, we load the transformed dataset to the SQL Staging table in the “Sink” task of the data flow.
In the “Settings” tab we truncate the staging SQL table and allow insert only.
In “mapping” tab we specify how the dataset is to be mapped to the staging table columns.
Once you have created and published the data flow in ADF, you can now execute this via a ADF pipeline via the “Data Flow” activity.
Compare SQL Stored Procedures with ADF Data Flow
Both methods mentioned above populates the SQL Staging table via the ADF pipeline activities. However, the crucial difference is the execution time.
In ADF you can view previous execution of the pipelines and the length of time taken. In this test scenario, the pipeline using SQL Stored Procedure took 22 seconds to complete (including load to D365), while the pipeline using the ADF data flow took almost 6 minutes to complete.
Going into each execution and looking at the Gantt Chart of the pipeline activities you can see that by comparison, Stored Procedure took 2 seconds to complete, but the data flow activity took 5 mins 36 seconds.
This has a direct impact on the cost of running the ADF pipelines. Using the pricing available in Microsoft website here, at the time of writing (July 2020) the 2 pipelines (one with SQ Stored Procedure, and one with ADF DataFlows) can be estimated as follows, with the assumptions:
- Execution of both pipelines only require Azure Integration Runtime.
- Each operation uses by default 4 Data-Integration Unit (DIU) in pipeline activities
- Data Flow execution is assumed to be on General Purpose 16 vCore, with debugging of data flow costing the same time as the real execution.
|ADF Pipeline with SQL Stored Procedure||ADF Pipeline with ADF Data Flow|
|Create Linked Service||3 Read/Write entities (SQL, CDS, KeyVault)||3 Read/Write entities (SQL, CDS, KeyVault)|
|Create Datasets||18 read/write entities (9 datasets, 9 linked services)||20 read/write entities (10 datasets, 10 linked services)|
|Create Pipelines||9 read/write entities (1 pipeline creation, 8 dataset references)||9 read/write entities (1 pipeline, 8 dataset references)|
|Get Pipelines||1 read/write entity||1 read/write entity|
|Run Pipelines||5 activity runs||5 activity runs|
|Copy Data Activities||(27/60) min execution time * 4 DIU||(30/60) min execution time * 4 DIU|
|Stored Procedure Activities||(2/60) min execution time||N/A|
|Data Flow Activities||N/A||(336/60) min * 16 vCore * 2 (debug and execution)|
|ADF Pipeline with SQL Stored Procedure – Cost||ADF Pipeline with ADF Data Flow – Cost|
|Data Factory Operations|
|Read/Write Ops||31 * £0.00000746 (£0.373 per 50,000 modified/referenced entities)||33 * £0.00000746 (£0.373 per 50,000 modified/referenced entities)|
|Pipeline Orchestration & Execution|
|Activity Run||5 * £0.000746 (£0.746 per 1000 runs)||5 * £0.000746 (£0.746 per 1000 runs)|
|Data Movement Activities||27/60 min * 4 DIU * £0.003117 (£0.187/per DIU-hour of Azure Integration Runtime)||30/60 * 4 DIU * £0.003117 (£0.187/per DIU-hour of Azure Integration Runtime)|
|Pipeline Activity||2/60 min * £0.00007 (£0.004/per hour)||N/A|
|Data Flow Activity||N/A||336/60 * 2 * 16 * £0.003583 (£0.215 per vCore-hour)|
|Total Price to migrate 5 test records to D365||£0.009574||£0.6523|
In summary, whilst ADF Data Flow is a no code option that allows you to transfer your dataset before loading to a Sink ADF dataset, you will need to factor in the execution time as well as cost. In a data migration we often move tens of thousands or millions of records from source to D365. The execution time and cost of using ADF Dataflow in my minds makes this no-code option currently out of reach.