I’ll start off with a disclaimer: I’m going to tell you about something that happened in a specific system Environment. There’s no such thing as a general advice you can build on this specific scenario. I’m just posting it because I was myself surprised by what order of magnitude I was able to speed up a specific query by slightly removing some of the work in the execution plan.
The other day I helped troubleshooting a database system. In a table with some 400 million records, a subset (50-60 million records) were to be deleted. The application kept timing out on this delete operation so I adviced the developer to split the delete operation into smaller chunks. I even helped writing a T-SQL script to perform the delete in one million row chunks. The script was pretty basic – a WHILE-loop which checked if any rows fulfilling the WHERE-condition of the delete was left in the table, and inside the loop a DELETE TOP(one million) followed by an explicit checkpoint.
I was sure that this script would do the trick, but I found that just deleting one million rows was a 20 minute job. 20 minutes for one million rows. And the table has 50-60 million rows. That’s something like a 15 hour job. No wonder the application kept timing out… Looking at the query plan for a delete of one million rows, I saw that a SORT-operator and a Non Clustered delete took 70% of the execution and I had a warning about spill to tempdb in the SORT operator. Knowing something about the disk latency for tempdb, I could also tell that this part of the execution plan was in fact taking more than 70% of the time.
So I started advicing on partitioning the data, in order to switch out data in specific partition to a staging area and truncate. Though this would be the “ultimate” solution in terms of performance, it should also be said that this is a rather advanced topic and it would require more future maintenance than not partitioning the data.
I had the SORT- and Non Clustered Delete operators in the back of my head and decided to try disabling the Non Clustered Index before performing the the delete. I tested how long a Non Clustered Index rebuild would take and to my surprise I saw that that was only an eight minute job. Eight minute to rebuild a NCI on 400 million rows, compared to 15 hours for deleting 50-60 million rows. So I built another T-SQL script, which disabled NCIs on the table, performed the delete operation and then rebuilt the NCIs. I tested only with deleting one million rows. The result: zero seconds for disabling NCIs. Nine seconds for the delete (!!!). Eight minutes for rebuilding the index. If I could delete one million rows in nine seconds, I was looking at something like 7 minutes for deleting all 50-60 million rows. That’s 0,8% of the time it would have taken to just perform the delete. If I add on the time it takes to rebuild the index, I’m up to a 15 minute job, compared to a 15 hour job before disabling the indexes.
BUT, someone would argue – if I disable and then rebuild the indexes, the table would be slow on Query performance the first seven minutes and in worst case it would be offline the following eight, during the index rebuild (if it’s not done online). But imaging the Query performance on a table in which a delete operation goes on for 15 hours.
The disk latency on tempdb and actually also on the transaction log is pretty high in this machine and that’s something which slows down performance. But even with an ok or even good disk configuration, I would say that disabling NCIs before the DELETE would save us a lot of time.
I’m not going to argue that you should ALWAYS do this or that, but if you find yourself struggling with updates, inserts or deletes which goes on forever, check your execution plan to see how much of the work really has to do with the DML operation and how much has to do with updating/inserting/deleting in non clustered indexes. Before going to your manager and ask for better hardware, look what you can do to optimize large workloads with a mix of DDL and DML operations. In my case, we had spill to tempdb already with one million rows. If I would have tried fighting that with more memory on the server, I would have had to ask for 150GB more memory in a box with 3GB available memory. Hardware is sometimes a real bottleneck, but I would argue that you should fix your code before adjusting the hardware.
Partitioning the table and/or implementing compression on the table would probably help even more. But these are both Enterprise features, which comes with a rather high cost. Not every organisation can afford that. If you don’t have the knowledge to troubleshoot really problematic queries yourself, it’s probably worth investing in a few days of expert help, before wasting money on new hardware or more expensive licenses.