SSIS data flow vs. insert-select
To transform data within a single SQL Server, with source and target data in the same database, it is probably faster to use an
INSERT statement than a SSIS data flow task.
I compared performance of both on a table with about 23 million rows, taking about 3 GB. On a small-to-medium size development server, an
INSERT statement took about 30 seconds,
and a comparable SSIS data flow task took over two minutes, roughly a 4x difference. The data flow task did no transformation – only a source, directly connected to target. I was using the
SQL Server native client, and was running
dtexec on the same server as the database so the cost of pulling rows over the network should not have been a factor. I was also turning off all progress logging from
dtexec to avoid inflating the runtime from overhead of writing to the console. (Just to be clear, I was not running the package from Visual Studio.)
To maximize performance I was using the
TABLOCK hint with the
INSERT statement (
insert into table with (tablock)) and have my database in simple recovery mode, to limit logging overhead. For
this kind of batch operation I don’t care about point-in-time recovery; if the process fails I would just start it over. This is similar to what I would have done in Oracle with
insert /*+ append */.
It seems like the main reason to use a data flow task is to load and transform data from different locations.