This is an oldie, but goodie.
A lot of times you’ll find yourself retrieving lists of records, e.g. a list of users on your site, only to find out that each of those records, i.e. each user requires a retrieval of another list of records: roles and permissions, hobbies, pets, preferred languages etc.
This will either cause queries in loops (do a SELECT for every row returned from the first SELECT). Or you might find yourself writing overly complicated pivot queries. (My personal choice has always been writing my own global Oracle functions for data aggregation – talk about overkill!)
Even if you have great hardware, your database will inevitably become your bottleneck, so take care of your database: design and build to avoid unnecessary database access and once you’re in the database, get your data as fast as possible and get the heck out!
Back to basics is your solution and there’s really not many things more basic than bitmasks. Let’s say you have a social networking site (who doesn’t these days?). When users register (or at some point later on), they can specify what kind of pets they own. You can use this data to perhaps match those users – cat lovers like other cat lovers, right? (*shifty eyes*)
This is how the relationship between users and pets might look like in your database:

Let’s say you have the following data in those tables:
user:
+----+----------+ | id | username | +----+----------+ | 1 | Maggie | | 2 | Sully | +----+----------+
pet:
+----+------------+ | id | name | +----+------------+ | 1 | cat | | 2 | dog | | 3 | lizard | | 4 | parakeet | | 5 | guinea pig | | 6 | snake | | 7 | unicorn | | 8 | ferret | +----+------------+
user_pet:
+---------+--------+ | user_id | pet_id | +---------+--------+ | 1 | 1 | | 1 | 4 | | 1 | 5 | | 2 | 7 | | 2 | 8 | +---------+--------+
How do I find out easily what kinds of pets my users have? Oh, it’s easy:
Maggie has:
select p.name
from user_pet up,
pet p,
user u
where u.username = 'Maggie'
and up.user_id = u.id
and p.id = up.pet_id;
+------------+
| name |
+------------+
| cat |
| parakeet |
| guinea pig |
+------------+
And Sully has*:
select p.name
from user_pet up,
pet p,
user u
where u.username = 'Sully'
and up.user_id = u.id
and p.id = up.pet_id;
+---------+
| name |
+---------+
| unicorn |
| ferret |
+---------+
* Of course Sully has a unicorn! (He also poops nanchucks.)
It’s easy to find out what kind of pets users have one at a time. Let’s do it in one big swoop though:
select u.username,
p.name
from user_pet up,
pet p,
user u
where up.user_id = u.id
and p.id = up.pet_id;
+----------+------------+
| username | name |
+----------+------------+
| Maggie | cat |
| Maggie | parakeet |
| Maggie | guinea pig |
| Sully | unicorn |
| Sully | ferret |
+----------+------------+
You get all the data you wanted, however, it’s all spread over many rows; some data aggregation is required! You’re probably thinking to yourself: “Oh, man, if only there were a function that works just like SUM() but for strings!”. If you’re using MySQL, you’re in luck thanks to the awesome GROUP_CONCAT() function. If you use it, you’ll get this:
select u.username, group_concat(p.name)
from user_pet up,
pet p,
user u
where up.user_id = u.id
and p.id = up.pet_id
group by u.username;
+----------+-------------------------+
| username | group_concat(p.name) |
+----------+-------------------------+
| Maggie | cat,guinea pig,parakeet |
| Sully | ferret,unicorn |
+----------+-------------------------+
Pretty swanky, eh? However, while group_concat() is totally awesome (oh, I love it so!), it’s only available in MySQL. (Although comments on this group_concat() praising blog post have an example of how to accomplish group_concat() in postgress, which you could also achieve in Oracle.) But I digress. Also, once you have the string representing all the pets, you’ll need to parse it in your application to get the pet names. And most importantly, this data is denormalized in a way that makes it a little difficult to enforce data integrity once you modify the list.
If group_concat() didn’t exist, what else can you do? Thinking of group_concat() as a sum() but for strings instead of numbers is an interesting approach. Next step: what do you have available out there that would yield a sum for a combination of values that could then be reverse engineered to get that unique combination of values again? That’s right, bitmasks!
How do you implement this in your user-pet scenario? It’s easy: first, represent bits in your bitmask as decimals on which you can later do math. Use a table to keep track of the bit-to-decimal translation for additional data integrity. Use those decimals as keys for your pets.
First, I create a table named power_of_two with the following values:
+----------+-------+ | exponent | power | +----------+-------+ | 0 | 1 | | 1 | 2 | | 2 | 4 | | 3 | 8 | | 4 | 16 | | 5 | 32 | | 6 | 64 | | 7 | 128 | +----------+-------+
Then I use those values as IDs in the pet table:
+-----+------------+ | id | name | +-----+------------+ | 1 | iguana | | 2 | cat | | 4 | dog | | 8 | lizard | | 16 | parakeet | | 32 | guinea pig | | 64 | snake | | 128 | unicorn | | 256 | ferret | +-----+------------+
Note that I added an iguana as pet with id of 1 – this is 2 to the 0th power. I needed to add a pet at this spot to account for the rightmost place in my bitmasks (otherwise I have to shift everything by 1, which can be confusing).
We have 9 possible pets. If I have no pets, my bitmask will be 000000000 – or 0. If I have the first and the third pet, my bitmask will be 000000101 – or 5.
Assuming these changes to my database, let’s see what kinds of pets Maggie and Sully have:
select u.username, sum(p.id)
from user_pet up,
pet p,
user u
where up.user_id = u.id
and p.id = up.pet_id
group by u.username;
+----------+-----------+
| username | sum(p.id) |
+----------+-----------+
| Maggie | 50 |
| Sully | 384 |
+----------+-----------+
50 = 0b110010 – so Maggie has the 2nd, 4th and 5th pet. 384 = 0b110000000, so Sully has the 8th and the 9th pets.
Let’s assume your application heavily caches data that doesn’t change often – such as the pet table (not user_pet). Let’s say you have this cached:
$pets = array(
1 => 'iguana',
2 => 'cat',
4 => 'dog',
8 => 'lizard',
16 => 'parakeet',
32 => 'guinea pig',
64 => 'snake',
128 => 'unicorn',
256 => 'ferret'
);
If your application already knows this:
Maggie: 50
Sully: 384
Then with a clever use of PHP’s decbin(), it should be really easy to display the right pet names.
This approach is great for when you have a very limited number of times you can connect to the database and once you’re there, you don’t have the luxury of running expensive aggregate queries. Also, web servers are easier to scale than coming up with error-proof database scaling solutions. I’ve also found this approach extremely useful when migrating lots of data about users (think millions of rows) from one system to another. Additionally, by having the power_of_two table, you’re able to be somewhat strict about the possible values for the user_pet table – this gives you slightly more data integrity than just using the concatenated string of names.
But the best part about using bitmasks is that you can do math on them! What’s easier to compare:
iguana,cat,dog,guinea pig,unicorn,ferret
to
iguana,cat,guinea pig,snake,unicorn,ferret
Here you have to possibly split the string on the comma into an array, then do array comparisons.
OR
110100111
111100011
Here you can use a bitwise AND to figure out the overlap!
Better yet, you can easily give people more pets!
110100111
OR
111111111
And now that person has all pets. And you didn’t even have to check if they had them before! Hooray for bitmasks!
Remember, as with many other optimization techniques, this one will not be appropriate for every scenario, but when you do need it, I promise it will work very very well!
P.S. I found it ironic to talk about a function that MySQL has but not Oracle – it’s usually the other way around. The str_agg (string aggregate) function I’ve had to write is so clunky – Oracle, please implement group_concat()!