Persisting Data – part 2: The Un-Indexed Heap

note- Nov 29, 2015 – finally got a copy of test scripts on github. see https://github.com/bwunder/data-load-methods for working version of all TSQL shown in this post…
A heap is a table with no clustered index. The performance and throughput advantages of targeting an un-indexed heap structure when loading multiple rows of data into SQL Server can be fantastic.

That does not shed much light on how to use an un-indexed heap in a line of business application – let alone to what advantage. It is entirely possible that your application will never be able to use an un-indexed heap in any way similar to those tested in the results shown below. Nonetheless, taking a few moments to understand these results will help you  build better line-of-business applications. You will have a greater mastery of possibilities. The information below also reveals the effects of a few important database configuration variables on INSERT and UPDATE activity.

I have identified a few requirements that should be used in all scenarios to realize optimal data loading to the heap.

  • Requirements for minimal logging must be met. Sunil Agarwal has several posts with internals level detail of minimal logging. Books Online is pretty good too.
  • An appropriate transactional granularity. Decorating the DataAdapter with a BatchSize is a fundamental recognition that interacting with the database a row at a time is slow, but does it really do so much to solve the problem?
  • A test facility.  Until a problematic scenario is understood well enough to make it happen in the lab, any fix can be based on incorrect assumptions.

It is entirely possible that your application will never be able to use an un-indexed heap in any way similar to those shown below. Nonetheless, taking a few moments to understand these results will help you build better line-of-business applications because you will be able to clearly see the effects of a few specific and targeted configuration changes.

The chart shown below is from data collected via a test harness designed to allow standardized and isolated comparisons of different data loading settings during evaluation and testing.

  • comparative test results for processing identical data into a shared reference target hierarchy are shown
  • the same source data and target hierarchy are used throughout the tests
  • one and only one configuration item is changed between the test labeled “RowHeap” and each of the other methods.
  • the hardware and all other configuration options are unchanged between tests.

Each change results in a the following picture of throughput capability:

RowHeapOptions

Loading Rows into a Heap

The actual values shown are specific to the platform and schema. The information shown here should be used – and should be useful – for comparison purposes. If you’d like to repeat the test on your hardware and with your schema I’m happy to share the scripts – I have them syncing to my SkyDrive with Mesh. Send me an email and I’ll give you access.

Description of the 4 test series shown:

RowHeap – This represents what the best practice based process as established by previous testing.

  • The heap’s DATA_COMPRESSION = ROW
  • target heap WITH(TABLOCK) hint added to insert statement
  • batches of rows are wrapped in database transactions

RowHeap NOCOMP

  • DATA_COMPRESSION = NONE
  • target heap WITH(TABLOCK) hint added to insert statement
  • batches of rows are wrapped in database transactions

RowHeap NoTABLOCK

  • DATA_COMPRESSION = ROW
  • target heap WITH(TABLOCK) hint NOT added to insert statement
  • batches of rows are wrapped in database transactions

RowHeapNOXACT

  • DATA_COMPRESSION = ROW
  • target heap WITH(TABLOCK) hint added to insert statement
  • batches of rows are NOT wrapped in database transactions

Everything about the each sample except the one changed variable as noted above and the number of rows per batch is the same.

Note that the test cycles manipulate a transaction wrapping each batch of inserts. Each insert within the batch is wrapped in a transaction whether or not the transaction around the batch is enabled. Therefore, the data inserted is transitionally consistent in every test series, regardless if the additional transaction is wrapped around each batch. The batch transaction is used for optimization only and serves no real purpose in protecting the data – but as you can see it works!

No method tested truly stands out when a small number of rows per batch are sent to database. If anything, the otherwise worst case series (RowHeapNOXACT) is actually twice as fast – or should I say less than half as slow? – when only 1 row is sent to the heap at a time. Perhaps this is a clue to help explain the myth about transactions being slower.

The data collected from testing shows that the non-transactional strategy remains about 10ms/row across the range of batch sizes tested: from 1 row to 100,000 rows. The increased fixed overhead of wrapping each batch in a transaction results in a load time that is 2-3 times as slow at the 1 row per batch and becomes about equivalent at the 5 row per batch size. For batches of 10 rows or more, the transaction wrapped strategies increasingly out perform the non-transactional approach until by the 500 rows per batch size each row is moving into the database 20 times faster when the batch has a transaction.

Now I ask you. Which would you rather be supporting? A process that can load its rows at 10 milliseconds each or a process that can load its rows in 1/3 millisecond (.33 ms) each?

Of course the question is facetious….

It really should be, “Which would you rather be supporting? A process that can load its rows in 10ms each or a process that can load its rows in 4/100ths of a millisecond (.04 ms):

MoreHeapOptions

More Heap Options

TVP = table-valued parameter. More on that later…

Advertisements
This entry was posted in Data Loading, Testing. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s