This week I learned something new about the value of indexing database tables.
The background: I’m working on a data warehouse project for my employer. The database is Microsoft SQL Server.
The problem: Update table ‘A’ (containing around 600K address records) with the contents of table B (containing the results of running the 600K addresses through address cleaning software). Both tables are temporary (not in the SQL Server sense, just in the I’ll-drop-them-when-I’m-done-with-them sense) tables created by importing .csv files into SQL Server through DTS. Both tables contain a two-field key that can be used to join the matching records.
Since the tables were only going to be used once, by me, I didn’t think to create indexes on them. I just ran the update statement… and waited for 20 minutes, during which time the tempdb system database grew to around 800M (what SQL server was putting in there, I don’t know). Finally, I got impatient and cancelled the transaction.
Then, out of curiosity, I created indexes on both tables (which took less than a minute) and ran the update statement again. It finished in about 3 minutes!
The moral of the story: Even if you’re creating a table that will only ever have a single query run against it, it may still be worth your time to create some indexes. I will remember this in the future. Still, I have to wonder: Shouldn’t the database be smart enough to create a temporary index in situations like this?