Some observations from the chart:
- SQL Server Compression is a no-brainer. It should be the default. I suspect that internally much can be done to reduce the CPU overhead of Compression to alleviate the primary concern: that it will create CPU bound SQL Servers . All data I have worked with gets more storage saving and performance improvement from ROW compression over PAGE compression. YMMV
- Transactions are the better-faster-smarter way to load many rows to the database. It is important to make the distinction between using a table lock and using a transaction. You must use a transaction to hold a table lock, but it is not required to exclusively lock the entire table when inserting or updating: it’s just faster. For sets it is hard to avoid transactions all together. There is an implicit transaction around each T-SQL statement by default, whether 1 row is affected or 1 million. This is in-adequate to prevent corruption for two+ table upserts, even if only for one row. The transaction without a table lock is every bit as safe, and almost as effective as the table lock in terms of increased throughput yet allows others to concurrently access other data in the table. Note that without the table lock DML operations will be fully logged thus increasing the likelihood of an IO bottleneck under load.
- In a loaded OLTP or web application, sometimes it is possible to place a lock on the target table. Typically this requires that the persistence operation is always quick. Placing the table lock on the heap is necessary for the fastest possible single threaded throughput and to assure minimal logging. In other situations it is more desirable to not lock the table – only lock the rows being added or updated – and allow other users to access the database objects targeted by the load. Whether to lock the table is determined by application design concurrency requirements. Both options should be available and exposed to the application or object-relational mapper.
- Sets are remarkably faster and use far fewer resources than batches of 100 or more rows are loaded into a SQL Server database. Period.
- Un-indexed heaps make an excellent target when bringing rows into SQL Server. Un-indexed heaps create less resistance than indexed tables because they do not carry the costs associated with the indexes and constraints required to support the application. Sooner or later the data must endure indexing and constraint validation overhead to become useful to the application.
Now, the analysis turns slightly to take a look at granularity when packaging data before moving it into an OLTP or Web application’s SQL Server database: rows, table-valued parameters (TVPs), and data files.
For this test cycle the data load will continue to target an un-indexed heap. Row compression will be enabled and explicit transactions will be used for all tests. Each granularity of transfer package will be tested with and without placing a table lock on the target heap. The test results with and without the table lock for each granularity are then combined in a Reporting Services range chart showing the range of typical min and max values for that granularity at the tested batch/set sizes. This is definitely for comparative purposes only. Exact ms/row metrics are hardware dependent. The table lock should be used when practical. If the table lock causes unacceptable blocking, either delay the insert until off hours or don’t use the table lock.
Looks very ‘expressionistic’ indeed! The range chart shows that the TVP can keep up with the fastest row-at-a-time load method when the Rows per batch/set is < 100 and can also keep up with BCP when rows per batch/set is > 5000. Between 100 rows and 5000 rows per batch/set, the way they do things in the data warehouse and the way .NET DataSets and the Entity Framework persist data don’t hold a candle to TVPs.
I will consider the load into the heap as the baseline load times for the data tested. No reason to expect that indexes and constraints will make the load anything but slower. With this baseline information at hand it is possible to add the indexes and constraints and repeat the test cycle and load the test results to a comparative line chart. The chart makes the difference clear without requiring a mental slide-rule.
There are still at least a few ways to exploit the un-indexed heap during the data loading process into tables that are usable by the application. (more on that later)
The TVP like the SQLBulkCopy set can be passed from an ADO.NET application as a DataTable or an IDataReader. In this un-indexed heap scenario everything happens within SQL Server so the TVP is passed between stored procedures only. That is kinda cool, but the real power of TVPs is when that can save the extra IO requirements when the application persists data to a file that is then loaded into SQL Server.
Bulk copy has been the unquestioned performance leader for moving lots-o-rows into SQL Server for quite a while. In order to be able to compare then side by side, a test cycle based on BCP.exe and another for the BULK INSERT T-SQL statement were defined. The test points provide a comparison of what happens when the bulk copy process runs inside SQL Server memory space and when the load is marshaled by a process external to SQL Server. Even though we use a local file here, the BCP test results should be representative of how the ADO.NET SQLBulkCopy fits into the comparison matrix. For more information about Microsoft’s recommendations see the SQL Server Books Online topic “Basic Guidelines for Bulk Importing Data”.
With SQLBulkCopy you do not need to use a file. SQLBulkCopy has similar requirements to the table-valued parameter and may indeed be easier to implement, especially when the data in the file fits directly into the target heap. However, testing from SSIS shows that the TVP out-performs SQLBulkCopy. (more on that later)
All bulk copy DATAFILETYPE settings were tested: native (n), character (c), wide (w) and wide-native (N). Based on the results and the reality that the types in the data will be the primary driver of what types are in the file, subsequent tests are standardized on wide native (N). The comparative test result between DATAFILETYPEs is not interesting from a performance perspective: it doesn’t matter much to SQL Server which DATAFILETYPE option is used. In the tested scenario, the data contains XML and NVARCHAR columns so double byte storage (i.e. wide or wise-native) is required.
There is a difference in the size of the file where the data is stored. Here is what that looks like for the same 10,000 rows:
Extrapolating from the file sizes, it is apparent that DATAFILETYPE could also affect network IO.
So that’s about it. The un-indexed heap is successfully beat to death. Next moving data into a two table hierarchy where one table must know the identity of the other. Lots of application code loose in wild already tangle around that problem. We will test and compare two ways to do this without marshaling IDENTITY values back and forth from the database server to the application: one using old school upserts, the other using MERGE.