Version:

ETL Process

The deployment utility runs an SSIS package, BBDW_ETL.dtsx. As a part of the logic of BBDW_ETL.dtsx, other SSIS packages for facts and dimensions run. The flow for BBDW_ETL.dtsx is broadly divided into ETL Pre-Processing, Sub-Package Processing, Sub-Package Processing Extend, and ETL Post-Processing.

The deployment utility runs an SSIS package, BBDW_ETL.dtsx. As a part of the logic of BBDW_ETL.dtsx, other SSIS packages for facts and dimensions run. The flow for BBDW_ETL.dtsx is broadly divided into ETL Pre-Processing, Sub-Package Pre-Processing, Sub-Package Processing, Sub-Package Processing Extend, a cube process, and ETL Post-Processing.

Note: This process and how to extend it is also discussed in BBDW ETL Extensions.

With version 3.0, you can specify to process up to four packages in parallel. The sub-package processing sequence in BBDW_ETL.dtsx has been updated to include four lanes. You enter how many lanes (1-4) to use from the Edit Advanced Data Mart Settings screen for the deployment utility. The field is Max ETL Parallelism under Deployment Options. This information is stored in the transactional database in a table called dbo.[OLAPDATASOURCE].

Figure: ETL Master Package (version 2.94 and before)

Figure: ETL Master Package with Four Lanes for Parallel Package Processing (version 3.0)

To support parallel sub-package processing, the Sub-Package Processing sequence has been split into sequences for Sub-Package Pre-Processing and Sub-Package Processing. The sequence that runs extensions packages (Sub-Package Processing Extend) has been updated, but the logic remains mostly the same. Extensions packages are run as before. BBDW_ETL.dtsx only runs out-of-the-box packages in parallel.

The lanes information for a given package is placed in a table in the Blackbaud Data Warehouse database called BBDW.[PACKAGELIST].

The package list file called BBDW_PackageList.txt is still read by the master package. But now there is an extra "DEPENDENCY" value for each line.

"PACKAGE","DEPENDENCY","ENABLED"

For example, for BBDW_FACT_INSTALLMENT, there are now four lines:

"BBDW_FACT_INSTALLMENT.dtsx","BBDW_DIM_CURRENCY.dtsx","1"
"BBDW_FACT_INSTALLMENT.dtsx","BBDW_DIM_REVENUESCHEDULE.dtsx","1"
"BBDW_FACT_INSTALLMENT.dtsx","BBDW_DIM_INSTALLMENT.dtsx","1"
"BBDW_FACT_INSTALLMENT.dtsx","BBDW_FACT_FINANCIALTRANSACTIONLINEITEM.dtsx","1"

This dependency information is placed in a table in the Blackbaud Data Warehouse database called BBDW.[PACKAGEDEPENDENCIES]. Lines in BBDW_PackageList.txt correspond to rows in this table.

The Drop ColumnStore Indexes task in Sub-Package Pre-Processing is a part of a set of changes to support using Columnstore indexes with Blackbaud Data Warehouse installations on SQL Server 2012. For more information about Columnstore indexes, see the MSDN article at Columnstore Indexes.

The cube-processing task was moved from ETL Post-Processing sequence into a new sequence called Process Cube. This sequence only runs if OLAP is configured.

With version 2.94 and before, the Sub-Package Processing Sequence reads a file identified by a connection manager, BBETL_File_PackageList. BBETL_File_PackageList manages a connection to a flat file that maintains a list of SSIS Packages for the ETL process. The information contained in the file is comma-separated lines that list SSIS Package files and whether the SSIS package is enabled.

Warning: You should not alter the BBETL_File_PackageList file. Rather, you can extend the ETL process. For more information, see BBDW ETL Extensions.

From BBDW_ETL.dtsx in the Sub-Package Processing Sequence, you can drill from the Control Flow view through Read Dimension Package List File to its Data Flow. To view the Data Flow, double-click Read Dimension Package List File.

From BBDW_ETL.dtsx in the Sub-Package Pre-Processing Sequence, you can drill from the Control Flow view through Get package details to its Data Flow. To view the Data Flow, double-click Get package details.

To open the Flat Source File Editor, you can double-click the Package File List data flow component. The Connection Manager page of the editor displays the name of the connection manager. For more information about Flat File Sources, see Microsoft's MSDN article at Flat File Source.

To open the Flat Source File Editor, you can double-click the Package File List data flow component. The Connection Manager page of the editor displays the name of the connection manager. For more information about Flat File Sources, see Microsoft's MSDN article at Flat File Source.

The Columns page of the Flat Source File Editor confirms the items that populate the list.

From the Data Flow, through a related tab called Connection Managers, you view a list of Connection Managers.

To find the name of the file that BBETL_File_PackageList connects to, select General and check the File name field.

Having obtained a list of SSIS packages with Read Dimension Package List File, the Sub-Package Processing Sequence proceeds to a ForEach loop that loops through each package in the list. The Sequence loop logs the start of the Package through a task called Log ETL Child Start. The loop executes each package through a task and a connection called BBETL_Package. Finally, the loop logs the end of the Package.

Having obtained a list of SSIS packages, the sequence proceeds to initialize a queue of package. Depending on the parallelism setting, up to four subpackages run at a time. Each parallel lane logs the start of the Package through a task called Log ETL Child Start. The lane executes each package through a task and a connection called BBETL_Package. Finally, the lane logs the end of the Package. This continues until the queue is exhausted.

One of the packages executed by Sub-Package Processing sequence is BBDW_DELETEDIDS.dtsx.

BBDW_DELETEDIDS.dtsx establishes connections to the BBETL_DeletedIDTables and BBETL_DeletedIDTables_EXT text files. For each Deleted ID table, the package truncates the table, gets Deleted IDs from the OLTP database audit tables, and deletes rows from the data warehouse database tables.