Checking For Existence - By Joe Geller

It is often necessary for a program to see if there are any rows meeting a set of criteria. It doesn't matter whether there is only 1 row or more than 1. There are several ways to do this. Unfortunately, programmers often pick the least efficient (but simplest) way, and inefficient in this case can be very inefficient.

One way is to ask:
SELECT COUNT(*) FROM table WHERE criteria;

Why is this bad? Even though only one row is returned (with the count), the dbms must find every row meeting the criteria. Depending on the criteria and the indexes, this could mean scanning through a large part of the table.

The second method is to do a regular SELECT (not with a cursor):
SELECT '1' FROM table WHERE criteria;

If there are no rows, you get back a -100 SQLCODE (in DB2). If there is 1 row, you get back that row. If there are more than 1 row, you get back an error, (-811 in DB2), since a singleton select can only be used to return 1 row. The idea, is that you don't really care what the row has, just if there is 1 or more. This method is okay if you have indexes on the right columns, but it can sometimes be as bad as the first method. The dbms does not know that there is a second row unless it looks for one. If the criteria can not be checked with the high order parts of an index, this too can result in large scans.

The third method requires a tiny bit more coding, but will generally be the best performing. It uses a cursor, with 1 FETCH. Only 1 row has to be looked for by the dbms. Even if there are more rows, there is no need to look for them. There is a little extra overhead in that there are 3 calls (OPEN, FETCH, & CLOSE), but this overhead is fairly small compared to the potential scanning of the table.

Summary


Often the simplest coding is the best, but not always. Using COUNT(*) just to check for existence is a poor technique and can result in excessive processing time.

Database Design Tips

Back to DB2 Main Page.
Designing Start & Stop Dates.
DB2 Version 4 Top Ten List.
Avoiding Lock Contention in a Client/Server Environment.
Data Compression - I/Os Go Down, CPU Time Goes Down??.
To Case or Not to Case.