Optimizing SQL is interesting because oftentimes it can be such a “soft” science. While the database is a computer and it will do what you tell it to, there are so many variables to consider that the outcome can sometimes seem almost unpredictable. Unless, of course, you have memorized all of Oracle docs.
I’m currently preparing for my talks at the php|works ‘07 in Atlanta and one of them is
How to Optimize a Database Query. I am digging through my bag of tricks, but I think what’s most important when writing a fast database-driven system is knowing (and sometimes predicting) what that system is expected to do.
Below are a few points you should consider before designing your database. They are aimed at optimizing a database system in a web application, but hey, who doesn’t like the Internet?
1. How fast do you need to return data?
Usually the answer to this is “as fast as possible”, but in reality, that’s not necessarily true. When your system is simply a reporting system returning statistics about your application (e.g. how many users like to log in at 3am, whether users are active only if other users are active, etc.), your queries can take a few seconds to run. When your application is a forum for impatient teenagers, they will expect the data to be visibly changed almost instantly or else they’ll go somewhere else.
But even with the most impatient users, you can trick your users. Even though the user performing the action will expect immediate results, that user’s friends will not even know about this change and will not expect to know about it until, well, they know about it. If you ever have some sort of a “Recent Actions” area of your site that lets you monitor other users, you can easily outsource the query that populates that data to a scheduled job.
2. How often will your data change?
When designing a database-driven system, you should always consider caching. I recommend memcached for all your web-optimizing needs. The TTL (time to live) of your cache depends on how often your data changes. Do you have a form where users have to pick from a list of 100 hobbies? Assuming those hobbies are in a database, they should be stored in cache. You probably won’t change these hobbies often or maybe even not at all. Cache this data in an 1 hour cache then (or even longer!)
How about some search results? For frequently searched keyword, cache the result set for about 15 minutes. It will change often enough for users to most likely not notice, but you get a huge performance boost.
Even a 1 minute cache is useful, especially if the data you’re caching is tiny. How about that “Recent Actions” area we mentioned before? Or data such as “created X minutes ago”. If your unit is a minute or more, you can cache the data for at least that minute. (The minute cache is your great weapon against users who like to click “Refresh” obsessively…)
How often your data changes also affects indexing, but I will devote an entire post to indexes as they are a huge and often misunderstood subject.
3. How large and complicated is the data you’ll be returning?
If you are only returning one row, your users will expect it to be returned very fast. If users are expecting to see lots and lots of data, they are usually more patient. (It’s true! Ask a sociologist!)
The nightmare scenario for any SQL writer is the expectation of writing a query that will process data from lots and lots of table, but the result set is just one row. All this work for so little gain! If you find yourself in such a situation, one of the possible solutions to this problem is to use views of the data you need to access. The views are populated on a schedule, maybe every few minutes or so, with results set of a complicated query with let’s say outer joins over four tables, subqeries, etc. Y’know, a query that would normally maybe take up to half a second or so. Now, when the user comes to the site, instead of running that complicated query on demand, you force the user to get the data from a view – this is a simple one-table query, usually returning data based on an indexed field. Very fast!
For much much more, see: Oracle Database Performance Tuning Guide.