<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Denormalization with Bitmasks</title>
	<atom:link href="http://maggienelson.com/2009/02/denormalization-with-bitmasks/feed/" rel="self" type="application/rss+xml" />
	<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/</link>
	<description>databases and code goodness</description>
	<lastBuildDate>Thu, 01 Apr 2010 23:53:35 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Log Buffer</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-131</link>
		<dc:creator>Log Buffer</dc:creator>
		<pubDate>Fri, 13 Feb 2009 20:07:59 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-131</guid>
		<description>Maggie Nelson elucidates what she deems an oldie but a goodie, denormalization with bitmasks. Very slick.

&lt;a href=&quot;http://www.pythian.com/blogs/1487/log-buffer-135-a-carnival-of-the-vanities-for-dbas&quot; rel=&quot;nofollow&quot;&gt;Log Buffer #135&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Maggie Nelson elucidates what she deems an oldie but a goodie, denormalization with bitmasks. Very slick.</p>
<p><a href="http://www.pythian.com/blogs/1487/log-buffer-135-a-carnival-of-the-vanities-for-dbas" rel="nofollow">Log Buffer #135</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-123</link>
		<dc:creator>David</dc:creator>
		<pubDate>Tue, 10 Feb 2009 01:04:47 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-123</guid>
		<description>Alternatively, you can also use this syntax if you want to specify which columns are used in a natural join:

SELECT *
FROM user
JOIN user_pet USING (user_id)
JOIN pet USING (pet_id)

This works great for the most part, and also removes duplicate columns, as the columns used in the join are collapsed. So you can use the column name in a WHERE clause without having to specify the table:

SELECT *
FROM user
JOIN user_pet USING (user_id)
JOIN pet USING (pet_id)
WHERE user_id = 1

It&#039;s a handy syntax for small simple queries. 

Bitmasking is way cool though. As for SchizoDuckie&#039;s comment. A real programmer should know a bitmask when he sees one, so I wouldn&#039;t see it as a major issue. The problem is that many web developers aren&#039;t real programmers, and have no formal training in programming (myself included). Sure, we should keep things simple, but in some ways, this is the simpler solution. Or maybe using SET is?</description>
		<content:encoded><![CDATA[<p>Alternatively, you can also use this syntax if you want to specify which columns are used in a natural join:</p>
<p>SELECT *<br />
FROM user<br />
JOIN user_pet USING (user_id)<br />
JOIN pet USING (pet_id)</p>
<p>This works great for the most part, and also removes duplicate columns, as the columns used in the join are collapsed. So you can use the column name in a WHERE clause without having to specify the table:</p>
<p>SELECT *<br />
FROM user<br />
JOIN user_pet USING (user_id)<br />
JOIN pet USING (pet_id)<br />
WHERE user_id = 1</p>
<p>It&#8217;s a handy syntax for small simple queries. </p>
<p>Bitmasking is way cool though. As for SchizoDuckie&#8217;s comment. A real programmer should know a bitmask when he sees one, so I wouldn&#8217;t see it as a major issue. The problem is that many web developers aren&#8217;t real programmers, and have no formal training in programming (myself included). Sure, we should keep things simple, but in some ways, this is the simpler solution. Or maybe using SET is?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: maggie</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-122</link>
		<dc:creator>maggie</dc:creator>
		<pubDate>Mon, 09 Feb 2009 15:15:20 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-122</guid>
		<description>Simon - I definitely agree.  This is not a solution I would use for everything - as with many other optimization techniques, this one works great, but only under very specific circumstances.

The power_of_two tables allows you quick lookups of the values instead of computing pow(2, x).  You could also possibly add a check constraint on pet.id where the id has to be in the set of values available in power_of_two.  It might be useful when calculating results for millions of rows - you wouldn&#039;t want to calculate the power of 2 for each of those.  But again, this works for very specific circumstances.</description>
		<content:encoded><![CDATA[<p>Simon &#8211; I definitely agree.  This is not a solution I would use for everything &#8211; as with many other optimization techniques, this one works great, but only under very specific circumstances.</p>
<p>The power_of_two tables allows you quick lookups of the values instead of computing pow(2, x).  You could also possibly add a check constraint on pet.id where the id has to be in the set of values available in power_of_two.  It might be useful when calculating results for millions of rows &#8211; you wouldn&#8217;t want to calculate the power of 2 for each of those.  But again, this works for very specific circumstances.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-121</link>
		<dc:creator>Simon</dc:creator>
		<pubDate>Mon, 09 Feb 2009 11:22:46 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-121</guid>
		<description>I&#039;d like to see it stressed that this approach is something of a last resort, and should only be taken if you already have real performance problems. Otherwise one would be merely obfuscating one&#039;s database for no gain - premature optimisation, really.

I wonder if you could explain the value of the &#039;power_of_two&#039; table? That&#039;s somewhat lost on me as, well, computers know that sort of thing already!</description>
		<content:encoded><![CDATA[<p>I&#8217;d like to see it stressed that this approach is something of a last resort, and should only be taken if you already have real performance problems. Otherwise one would be merely obfuscating one&#8217;s database for no gain &#8211; premature optimisation, really.</p>
<p>I wonder if you could explain the value of the &#8216;power_of_two&#8217; table? That&#8217;s somewhat lost on me as, well, computers know that sort of thing already!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Steinberg</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-120</link>
		<dc:creator>Dave Steinberg</dc:creator>
		<pubDate>Mon, 09 Feb 2009 04:01:45 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-120</guid>
		<description>Great post Magda!</description>
		<content:encoded><![CDATA[<p>Great post Magda!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michiel Hakvoort</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-119</link>
		<dc:creator>Michiel Hakvoort</dc:creator>
		<pubDate>Sun, 08 Feb 2009 23:57:16 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-119</guid>
		<description>Hi Maggie. Interesting read! I was thinking you could actually do this without denormalizing (if Oracle supports shifting :) your database by using 

select u.username, sum(1 &lt;&lt; p.id)
from user_pet up,
       pet p,
       user u
where up.user_id = u.id
   and p.id = up.pet_id
group by u.username;

It is however somewhat less transparent and a (tiny) little more computationally intensive, but then again, using exclusively powers of two as keys in order to be able to generate bit masks isn&#039;t that transparent either :-p. If you really want your keys to remain under 32, you can always use a checks, triggers, or a relation containing the numbers 0 till 31 for that of course.

Also, I&#039;m wondering about one thing, as you mentioned

&quot;... And now that person has all pets. And you didn’t even have to check if they had them before! Hooray for bitmasks!&quot;

As this post is about database design, I&#039;m wondering whether this is about assigning pets to a user in the database or, temporarily, in the application. (Of course, storing the user-pet relation in a bit mask itself and dropping the user_pet relation would yield poor performance when searching for other cat lovers :-p.)

Finally, I must say I have to agree with SchizoDuckie on the particular use. Although it can be beneficial in some cases (small sets)...  If you&#039;re going to have to cache values anyways I&#039;d rather go for offline aggregating (and even offline joining) to get some performance increase, rather than adapting the database design to a solution which isn&#039;t really suited for further growth (in case of adding more animals, that is :-). But for the &quot;some cases&quot; I think  the ability of using bit masks with the combination of normalization (perhaps a little de-normalization) is pretty elegant.</description>
		<content:encoded><![CDATA[<p>Hi Maggie. Interesting read! I was thinking you could actually do this without denormalizing (if Oracle supports shifting :) your database by using </p>
<p>select u.username, sum(1 &lt;&lt; p.id)<br />
from user_pet up,<br />
       pet p,<br />
       user u<br />
where up.user_id = u.id<br />
   and p.id = up.pet_id<br />
group by u.username;</p>
<p>It is however somewhat less transparent and a (tiny) little more computationally intensive, but then again, using exclusively powers of two as keys in order to be able to generate bit masks isn&#8217;t that transparent either :-p. If you really want your keys to remain under 32, you can always use a checks, triggers, or a relation containing the numbers 0 till 31 for that of course.</p>
<p>Also, I&#8217;m wondering about one thing, as you mentioned</p>
<p>&#8220;&#8230; And now that person has all pets. And you didn’t even have to check if they had them before! Hooray for bitmasks!&#8221;</p>
<p>As this post is about database design, I&#8217;m wondering whether this is about assigning pets to a user in the database or, temporarily, in the application. (Of course, storing the user-pet relation in a bit mask itself and dropping the user_pet relation would yield poor performance when searching for other cat lovers :-p.)</p>
<p>Finally, I must say I have to agree with SchizoDuckie on the particular use. Although it can be beneficial in some cases (small sets)&#8230;  If you&#8217;re going to have to cache values anyways I&#8217;d rather go for offline aggregating (and even offline joining) to get some performance increase, rather than adapting the database design to a solution which isn&#8217;t really suited for further growth (in case of adding more animals, that is :-). But for the &#8220;some cases&#8221; I think  the ability of using bit masks with the combination of normalization (perhaps a little de-normalization) is pretty elegant.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Raul Pedro Santos</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-118</link>
		<dc:creator>Raul Pedro Santos</dc:creator>
		<pubDate>Sun, 08 Feb 2009 23:44:39 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-118</guid>
		<description>I have to admit I don&#039;t usually use anything other than &quot;id&quot; for my primary keys, which forces me to use the USING or ON clauses along with JOINs ;)

As for the problem you mention, some time ago I bumped into a strange when I upgraded an application to MySQL 5 and if I&#039;m not mistaken, from 5.x on MySQL forces you to use parentheses precisely to prevent that ambiguity (the error I was seing was due to a JOIN that didn&#039;t have the parentheses, if I recall correctly). But yes, it&#039;s a perfectly valid concern.

I thought about the size of the result set, too, but I guess that even with a small result set even a small performance gain can be advantageous when you&#039;re dealing with an application that has a high volume of traffic from the database - in other words, if the query gets executed a lot in a small amout of time.</description>
		<content:encoded><![CDATA[<p>I have to admit I don&#8217;t usually use anything other than &#8220;id&#8221; for my primary keys, which forces me to use the USING or ON clauses along with JOINs ;)</p>
<p>As for the problem you mention, some time ago I bumped into a strange when I upgraded an application to MySQL 5 and if I&#8217;m not mistaken, from 5.x on MySQL forces you to use parentheses precisely to prevent that ambiguity (the error I was seing was due to a JOIN that didn&#8217;t have the parentheses, if I recall correctly). But yes, it&#8217;s a perfectly valid concern.</p>
<p>I thought about the size of the result set, too, but I guess that even with a small result set even a small performance gain can be advantageous when you&#8217;re dealing with an application that has a high volume of traffic from the database &#8211; in other words, if the query gets executed a lot in a small amout of time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: maggie</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-117</link>
		<dc:creator>maggie</dc:creator>
		<pubDate>Sun, 08 Feb 2009 23:14:55 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-117</guid>
		<description>Raul - for one, I&#039;d have to change my column names and I&#039;m lazy :)  Seriously though, I didn&#039;t use a natural join here because I don&#039;t usually use them at all.  They tend to sometimes have issues with ambiguous results (see an example in Oracle documentation &lt;a href=&quot;http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9834&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;).  Since I don&#039;t use them often at all, I didn&#039;t think of using them here.

The result set is very small, so any performance considerations go out the window (although now I&#039;m curious about the performance benchmarks on inner join vs. natural join implementations across the common databases!)

I&#039;ll see if WordPress enables comment preview!</description>
		<content:encoded><![CDATA[<p>Raul &#8211; for one, I&#8217;d have to change my column names and I&#8217;m lazy :)  Seriously though, I didn&#8217;t use a natural join here because I don&#8217;t usually use them at all.  They tend to sometimes have issues with ambiguous results (see an example in Oracle documentation <a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9834" rel="nofollow">here</a>).  Since I don&#8217;t use them often at all, I didn&#8217;t think of using them here.</p>
<p>The result set is very small, so any performance considerations go out the window (although now I&#8217;m curious about the performance benchmarks on inner join vs. natural join implementations across the common databases!)</p>
<p>I&#8217;ll see if WordPress enables comment preview!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Raul Pedro Santos</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-116</link>
		<dc:creator>Raul Pedro Santos</dc:creator>
		<pubDate>Sun, 08 Feb 2009 21:27:53 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-116</guid>
		<description>By the way, I meant NATURAL JOIN and not INNER JOIN.

Also, for this to work, column names had to be consistent across tables. That is, the &quot;id&quot; column in both the &quot;user&quot; and the &quot;pet&quot; tables should be named &quot;user_id&quot; and &quot;pet_id&quot;, like the columns in the &quot;user_pet&quot; table.

Specifically, the last query you wrote before introducing the bitmask change, would be this:

SELECT username, group_concat(name)
FROM user NATURAL JOIN user_pet NATURAL JOIN pet
GROUP BY username;</description>
		<content:encoded><![CDATA[<p>By the way, I meant NATURAL JOIN and not INNER JOIN.</p>
<p>Also, for this to work, column names had to be consistent across tables. That is, the &#8220;id&#8221; column in both the &#8220;user&#8221; and the &#8220;pet&#8221; tables should be named &#8220;user_id&#8221; and &#8220;pet_id&#8221;, like the columns in the &#8220;user_pet&#8221; table.</p>
<p>Specifically, the last query you wrote before introducing the bitmask change, would be this:</p>
<p>SELECT username, group_concat(name)<br />
FROM user NATURAL JOIN user_pet NATURAL JOIN pet<br />
GROUP BY username;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Raul Pedro Santos</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/comment-page-1/#comment-115</link>
		<dc:creator>Raul Pedro Santos</dc:creator>
		<pubDate>Sun, 08 Feb 2009 21:05:46 +0000</pubDate>
		<guid isPermaLink="false">http://maggienelson.com/?p=168#comment-115</guid>
		<description>Oops, looks like the comments don&#039;t allow bbcode. Sorry.
This brings up another suggestion: enable comment preview, so your visitors can make sure their comments will show up as they intended them to :)</description>
		<content:encoded><![CDATA[<p>Oops, looks like the comments don&#8217;t allow bbcode. Sorry.<br />
This brings up another suggestion: enable comment preview, so your visitors can make sure their comments will show up as they intended them to :)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
