note- Nov 29, 2015 – test scripts at https://github.com/bwunder/data-load-methods…
In part 3, test metrics across a range of import sizes where data is persisted to an un-indexed heap table were compared. The un-indexed heap was ideal for isolating the import – e.g., get data into the database just far enough so it is safe – operation from the ‘normalization’ activity needed to render the data useful.
An import to an un-indexed heap may be a useful step in persisting application data, but it leaves the story unfinished. The data must ultimately be normalized – or de-normalized, ab-normalized, ‘flattened’, aggregated, pivoted, etc. as the case may be. What matters here is that the term normalization is intended to describe everything that must happen just after the import so that the data is ready for use and indexed such that the application is able to retrieve the data when needed. Obviously, the actual resource cost of ‘normalization’ will vary to some undetermined degree from application to application and installation to installation. All we can be certain of is that each application will be different. By how much is unknown until someone with adequate knowledge of the data flow and the database engine digs in to get that answer.
That variability is why it is so helpful to have the un-indexed heap metrics described in part 2 available as a baseline reference.
This series of posts hope to offer a generalized test scenario intended as a starting place for digging in. If, from this information, the thought process is stimulated to identify and characterize the patterns currently in use by an application and to see how those pattern might generally compare with the other persistence methods and models, then the purpose will be achieved. It is a fairly complex topic. Breaking the topic into a few posts allows me to better focus on different aspects in greater detail. This post is specific to the DML operation in SQL Server to complete the final or ‘normalization’ step.
Hopefully this is not an un-interesting detour out of the data design lab and into the database engine. The old school upsert was augmented by the OUTPUT clause of the INSERT, UPDATE and DELETE T-SQL statements in SQL Server 2005 and then – apparently – complemented by the MERGE statement in SQL Server 2008. MERGE also supports the OUTPUT clause. There remains a lot of code that uses the old school upsert. The OUTPUT clause is so effective that it is quickly coming into common usage. Applications not making use of the OUTPUT clause may be missing the set-based boat. (Applications that tramp surrogate candidate keys from the database to the application code that pushed the insert, then back to the database as foreign key references surely did miss the boat.)
Continuing to work with the test methodology already established to measure, chart and compare the “Baseline Load Times per Row to Un-Indexed Heap” from part 2, a meaningful comparison of upsert and MERGE is possible. The test results are even a bit interesting, yet far from compelling one way or the other. There should be no big surprise that which is better suited for a specific purpose will invariably depend on the details of the purpose.
Ultimately, folks that find something below of interest will want to conduct similar testing using source and target data structures representative of the local application.
The chart below depicts the general range of expected throughput for the tested data flow scenario. This is a range chart where each of the wide lines represents the likely variability of the load. Sample populations include tests with and without placing a table lock on each target table.
From 1 to about 500 rows, the normalized row-at-a-time operation is faster than all TVP methods and even faster than the heap. The row-at-a-time MERGE appears to have a slight advantage over the row-at-a-time upsert up through 500 rows. Then at 500 rows the row-at-a-time MERGE and upsert bifurcate along the upper and lower limits of the very wide row-at-a-time heap range! Only the upsert row-at-a-time operation seems to demonstrate flat scalability across the test row range. To move data row-at-a-time, probably better to stick with upserts and expect your loads to take a long time if your batches are larger.
Conversely, both methods of normalizing from a TVP show improved throughput as the set size increases. At around 50 rows the set throughput surpasses the throughput of either row-at-a-time method. The difference between the set based MERGE and the upsert is hardly significant. The narrower width of the MERGE range suggests it is more stable than upsert – at least in these solo process test scenarios. For sets, and all other things being equal, it is a toss-up whether MERGE or upsert is the better choice.
To conduct these tests to compare merge and upsert, two base stored procedures, 1 used for row-at-a-time upserts and 1 used for tvp based upserts were modified to replace the UPDATE + INSERT with a MERGE statement.
Row-at-a-time upsert tests use the following pattern:
DECLARE @UpsertMapper TABLE ( ); BEGIN TRY BEGIN TRANSACTION; UPDATE p SET = <parameters/local variables>OUTPUT inserted. INTO @UpsertMapper FROM AS p WHERE p. = <parameters/local variables>; IF @@ROWCOUNT = 0 INSERT ( ) OUTPUT inserted. INTO @UpsertMapper VALUES ( <parameters/local variables> ); INSERT dbo.tChild ( ) SELECT mapper. , <parameters/local variables> FROM @UpsertMapper AS mapper; COMMIT TRANSACTION; END TRY
In the pattern above, the Upsert brings the parent identity along whether an update or insert is required. No need to create, populate, or evaluate any local variables. Genuine old school upserts might have to use an additional SELECT to get the parent surrogate key in the UPDATE case and rely upon the SCOPE_IDENTITY() system function – or perhaps the riskier @@IDENTITY global variable – to provide the identity to the child row at INSERT. The OUTPUT clause reduces the required IOs and eliminates some in-line processing requirements. It is much cleaner. Genuine old school ADO.NET upserts – with the identity marshaled between the entities by the application – require addition inter-process communication (IPC) serialization and some usually fascinating data gymnastics in the code. It should be obvious that doing the work in one place – in this case at the database server – will always be faster than distributed transactional processing.
For comparison, below is the MERGE pattern for the row-at-a-time operation:
DECLARE @MergeMapper TABLE ( ); BEGIN TRY BEGIN TRANSACTION; MERGE AS p USING (SELECT <parameters/local variables> ) AS derived ON = <parameters/local variables> WHEN MATCHED THEN UPDATE SET = derived.<parameters/local variables> WHEN NOT MATCHED THEN INSERT ( ) VALUES ( derived.<parameters/local variables> ) OUTPUT Inserted. INTO @MergeMapper; INSERT ( ) SELECT mapper. , , <parameters/local variables> FROM @MergeMapper AS mapper; COMMIT TRANSACTION; END TRY
One interesting difference is that the MERGE requires one less write operation to the @MergeMapper table variable. Like OUTPUT, I would have to agree that MERGE is cleaner. One might therefore expect that MERGE will require less IO than the UPDATE+INSERT pattern. If anything, test results indicate the opposite.
Table-valued parameter based upserts are easily adapted to the row patterns. All that is absolutely necessary is to change the references to parameters/local variables to reference the passed TVP.
Here is the TVP upsert pattern:
CREATE TABLE #UpsertMapper ( , , UNIQUE ()); CREATE TABLE #TempTable ( , , , PRIMARY KEY ()) -- deduplication before upsert - not required, seems to help the test data INSERT #TempTable ( , , ) SELECT , , MIN( ) FROM @TVP GROUP BY ; BEGIN TRY BEGIN TRANSACTION; UPDATE p SET = t. OUTPUT inserted. , inserted. INTO #UpsertMapper FROM AS p JOIN #TempTable AS t ON p. = t. ; INSERT ( ) OUTPUT inserted. , inserted. INTO #UpsertMapper SELECT tmp. , tvp. , tmp. FROM #TempTable AS tmp LEFT JOIN AS p ON p. = tmp. JOIN @TVP AS tvp ON tvp. = tmp. WHERE p. IS NULL; INSERT ( ) SELECT map. , tvp. FROM #UpsertMapper AS map JOIN @TVP AS tvp ON map. = tvp.; COMMIT TRANSACTION; END TRY
And the TVP MERGE pattern is:
CREATE TABLE #MergeMapper ( PRIMARY KEY , , UNIQUE ()); CREATE TABLE #TempTable ( , , , PRIMARY KEY ()) INSERT #TempTable ( , , ) SELECT , , ) FROM @TVP GROUP BY ; BEGIN TRY BEGIN TRANSACTION; MERGE AS p USING (SELECT tmp. FROM #TempTable AS tmp JOIN @TVP AS tvp ON tmp. = tvp.) AS derived ON p. = derived. WHEN MATCHED THEN UPDATE SET += derived. , WHEN NOT MATCHED THEN INSERT ( ) VALUES ( derived. ) OUTPUT Inserted. , Inserted. INTO #MergeMapper; INSERT ( , ) SELECT map. , tvp. FROM #MergeMapper map JOIN @TVP tvp ON map. = tvp.; COMMIT TRANSACTION; END TRY
The first chart below shows row-at-a-time loads that transform the source data record into an update else insert to the parent followed by an insert to the child. Whether the parent is inserted or updated, the identity, along with the natural key columns from the incoming data is made available to the child insert without the need to join to the parent during the child insert. Tests are standardized to load identical batches of rows into the same target hierarchy. The Parent identity column is included in all child rows, whether key Batch sizes are fixed at 1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000 and 100,000 rows. Each batch is wrapped in a transaction. Several steps were taken to assure meaningful test results. Multiple executions per batch size within each test run were used. Multiple test runs were executed for both load methods.
Looking first at the row-at-a-time series (i.e., RowMerge, RowUpdate) note these observations:
On the left side of the chart – where the batches are smaller, it is clear that the row-at-a-time operations can provide more throughput capability than set based operations at low rows per batch. The advantage decreases as the batch size increases. Then, at the point where neither the upsert or the MERGE show an advantage – just beyond 1000 rows per batch, the row-at-a-time MERGE begins to degrade exponentially while the upsert’s scale-ability remains relatively flat up through the largest batches tested.
As demonstrated in part 2 and 3, if the transaction wrapping each batch of the MERGE is removed a slight degradation is noticeable for small batches. Larger batches do as well as the upsert.
Bear in mind that the insert operation is an upsert or MERGE of a parent table followed by an insert to a child table. The transaction will not automatically serialize access to the table. Only when a table lock is explicitly placed on the table of enough data is changed that row or page locks escalate to HOBT locks would concurrency be degraded due to serialization. With that said, using an explicit table lock will deliver the best performance if there is no concurrency concern. The scenarios where multiple concurrent loads are required will be examine in a future post.