PostgreSQL performance of "where exists"

Today I was looking into the performance of a PostgreSQL query with a "... where exists (select 1 from ...)" subquery:

select foo_id from foo

where exists (select 1 from bar where bar.foo_id = foo.foo_id)

I was surprised to find out that this query actually ran faster when I restructured it with a SELECT DISTINCT and a JOIN:

select distinct(foo_id) from bar

join foo on bar.foo_id=foo.foo_id

Some references on the web I've found suggest that EXISTS is the preferred way to write the above query in general. Because it's a boolean condition, in theory the database needs to scroll fewer rows because it can stop as soon as the first match is found; and the DISTINCT can be expensive if the results from the join version would not have been unique.

An ancient PostgreSQL mailing list post indicates that rewriting the query as a JOIN may be faster than EXISTS in PostgreSQL, because the join can take advantage of indexes while EXISTS does a nested loop. But, then again, I'm still using PostgreSQL 7.3.x, and EXISTS handling may well have been improved in 7.4.

Written on December 16, 2004

Bill Schneider

where I write about software engineering. All opinions are my own

PostgreSQL performance of "where exists"