Articles   Home

Database Management: Data Loading and Indexes

Database management that includes data loading and index administration is the key to quickly loading tables or running batch jobs that load a set of tables. For example, brain dead Enterprise Resource Planning (ERP) products seem to delete all rows one by one from a set of tables then reload the tables with indexes still in place. Every major ERP product does this as every one of them seems brain dead. The result is batch jobs take 40 hours and are scheduled to run "overnight."

Most DBAs and developers that know their RDBMS will truncate or drop and recreate tables that are to be completely refreshed. This gets rid of the lengthy delete phase. But don't forget to handle the indexes. The simple point of this article is that you load data or run batch jobs that load tables with the indexes dropped. Re-create the index at the end of the process.

Why is loading with indexes slow? The index is building at the same time as the tables are loading. But the index is an ordered set of data and takes longer to create than adding data to the table. It has been shown that loading presorted data in an index order is usually faster than unordered data. But with several indexes on a table the chances of one or many being "unsorted" is high which will slow the process.

How fast is loading tables without indexes compared to loading with indexes updating continuously? I have done loads that vary a factor of two to a hundred times faster depending on how many tables, the size of data set and index configuration. Try it yourself and time it. Of course to do this you must be able to access the DDL (Data Definition Language) to get the index definitions. With OTS (off the shelf) products that are patched "a lot" like ERPs administrators must track DDL changes to make sure the index that is built is the index that matches the current OTS software configuration.

With this administration overhead you can see that you don't want to waste this effort optimizing a half hour data load. Save your effort for the long batch jobs.