<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Maggie Nelson &#187; performance</title>
	<atom:link href="http://maggienelson.com/tag/performance/feed/" rel="self" type="application/rss+xml" />
	<link>http://maggienelson.com</link>
	<description>databases and code goodness</description>
	<lastBuildDate>Tue, 06 Apr 2010 17:24:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Optimization Woes</title>
		<link>http://maggienelson.com/2009/03/optimization-woes/</link>
		<comments>http://maggienelson.com/2009/03/optimization-woes/#comments</comments>
		<pubDate>Tue, 10 Mar 2009 14:58:20 +0000</pubDate>
		<dc:creator>maggie</dc:creator>
				<category><![CDATA[entry]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[oop]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tricks]]></category>

		<guid isPermaLink="false">http://maggienelson.com/?p=227</guid>
		<description><![CDATA[I think by now everyone&#8217;s familiar with the phrase:
&#8220;Premature optimization is the root of all evil.&#8221;
In my day-to-day job, I worry a lot about writing high-performance applications, especially in the way applications manage and retrieve data.  There is a huge difference between designing your application to be optimized for a specific purpose and premature [...]]]></description>
			<content:encoded><![CDATA[<p>I think by now everyone&#8217;s familiar with the phrase:</p>
<blockquote><p>&#8220;Premature optimization is the root of all evil.&#8221;</p></blockquote>
<p>In my day-to-day job, I worry a lot about writing high-performance applications, especially in the way applications manage and retrieve data.  There is a huge difference between designing your application to be optimized for a specific purpose and premature optimization.  It&#8217;s one thing to worry whether echo or print is faster (who cares, really?) &#8211; but it&#8217;s another to design an application to tolerate some slowness in low-traffic parts but gaining ultra fast response time on the first access to the application.  For example, viewing your photo album may take a while but seeing the homepage is pretty freakin&#8217; fast!</p>
<p>In a recent post, Sebastian Bergmann points out that <a href="http://sebastian-bergmann.de/archives/854-Do-Not-Micro-Optimize.html">you should not micro-optimize</a> (in reference to a <a href="http://www.alexatnet.com/node/196">PHP micro-optimization tricks</a> post by Alex Netkachov).  I agree wholeheartedly.  It&#8217;s a nice list and definitely something to keep in mind, but it should be just a small component of your developer bag of tricks!  Before you can optimize, you have to design well first.  Proper design leads to potential for great optimization. And to design well, you have to really understand the problem you&#8217;re trying to solve.  Sebastian links to slides by Ilia Alshanetky from PHP Quebec 2009 on <a href="http://ilia.ws/files/phpquebec_2009.pdf">Common Optimization Mistakes</a>.  Ilia also points out that it&#8217;s important to understand before you start fixing:</p>
<blockquote><p>Solve the business case before optimizing the solution.</p></blockquote>
<p>For PHP Advent, I wrote an article <a href="http://phpadvent.org/2008/optimize-this-by-maggie-nelson">Optimize This!</a>, which talks about how to optimize an existing application.  The most important part to focus on is understand what the application is doing and then figuring out where you&#8217;ll have most impact.  Sure, you can use echo over print (or print over echo, once again, who cares?) to get that 0.0001% performance boost, but if you&#8217;re doing 140 live SQL queries on your homepage, your application will still have 3 minute response time and with more users, the performance will continue to degrade until the application becomes unusable.  </p>
<p>Modularizing code (aka &#8220;Object-Oriented Programming&#8221;) makes it easy to focus on problem areas in an existing application.  Designing well doesn&#8217;t always mean designing for high performance.  Instead, it means designing for flexibility and atomicity &#8211; you want to be able to easily change things without huge impact on the overall system.  So before you go an optimize, take a step back and think about what your application is doing and what it&#8217;s supposed to do.  Go write on the white board &#8211; diagrams prove very useful in figuring out flows and functionality!  Go chat in #phpc channel on irc.freenode.net &#8211; the PHP community will help (with a little bit of playful making fun of, but we&#8217;ll help).  Just don&#8217;t go changing all your methods to static methods unless you&#8217;re really sure it&#8217;s the right thing to do!</p>
]]></content:encoded>
			<wfw:commentRss>http://maggienelson.com/2009/03/optimization-woes/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Denormalization with Bitmasks</title>
		<link>http://maggienelson.com/2009/02/denormalization-with-bitmasks/</link>
		<comments>http://maggienelson.com/2009/02/denormalization-with-bitmasks/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 03:48:43 +0000</pubDate>
		<dc:creator>maggie</dc:creator>
				<category><![CDATA[entry]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[denormalization]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tricks]]></category>

		<guid isPermaLink="false">http://maggienelson.com/?p=168</guid>
		<description><![CDATA[This is an oldie, but goodie.
A lot of times you&#8217;ll find yourself retrieving lists of records, e.g. a list of users on your site, only to find out that each of those records, i.e. each user requires a retrieval of another list of records: roles and permissions, hobbies, pets, preferred languages etc.
This will either cause [...]]]></description>
			<content:encoded><![CDATA[<p>This is an oldie, but goodie.</p>
<p>A lot of times you&#8217;ll find yourself retrieving lists of records, e.g. a list of users on your site, only to find out that each of those records, i.e. each user requires a retrieval of another list of records: roles and permissions, hobbies, pets, preferred languages etc.</p>
<p>This will either cause queries in loops (do a SELECT for every row returned from the first SELECT).  Or you might find yourself writing overly complicated <a href="http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:31263576751669">pivot queries</a>.  (My personal choice has always been writing my own global Oracle functions for data aggregation &#8211; talk about overkill!)</p>
<p>Even if you have great hardware, your database will inevitably become your bottleneck, so take care of your database: design and build to avoid unnecessary database access and once you&#8217;re in the database, get your data as fast as possible and get the heck out!</p>
<p>Back to basics is your solution and there&#8217;s really not many things more basic than <a href="http://en.wikipedia.org/wiki/Mask_(computing)">bitmasks</a>.  Let&#8217;s say you have a social networking site (who doesn&#8217;t these days?).  When users register (or at some point later on), they can specify what kind of pets they own.  You can use this data to perhaps match those users &#8211; cat lovers like other cat lovers, right?  (*shifty eyes*)</p>
<p>This is how the relationship between users and pets might look like in your database:</p>
<p style="text-align: center;"><img class="size-full wp-image-172" title="user_pet_erd" src="http://maggienelson.com/blog/wp-content/uploads/2009/02/user_pet.png" alt="user_pet_erd" width="366" height="141" /></p>
<p>Let&#8217;s say you have the following data in those tables:</p>
<p>user:</p>
<pre>+----+----------+
| id | username |
+----+----------+
|  1 | Maggie   |
|  2 | Sully    |
+----+----------+</pre>
<p>pet:</p>
<pre>+----+------------+
| id | name       |
+----+------------+
|  1 | cat        |
|  2 | dog        |
|  3 | lizard     |
|  4 | parakeet   |
|  5 | guinea pig |
|  6 | snake      |
|  7 | unicorn    |
|  8 | ferret     |
+----+------------+</pre>
<p>user_pet:</p>
<pre>+---------+--------+
| user_id | pet_id |
+---------+--------+
|       1 |      1 |
|       1 |      4 |
|       1 |      5 |
|       2 |      7 |
|       2 |      8 |
+---------+--------+</pre>
<p>How do I find out easily what kinds of pets my users have?  Oh, it&#8217;s easy:</p>
<p>Maggie has:</p>
<pre>select p.name
  from user_pet up,
       pet p,
       user u
 where u.username = 'Maggie'
   and up.user_id = u.id
   and p.id = up.pet_id;

+------------+
| name       |
+------------+
| cat        |
| parakeet   |
| guinea pig |
+------------+</pre>
<p>And Sully has*:</p>
<pre>select p.name
  from user_pet up,
       pet p,
       user u
 where u.username = 'Sully'
   and up.user_id = u.id
   and p.id = up.pet_id;

+---------+
| name    |
+---------+
| unicorn |
| ferret  |
+---------+</pre>
<p>* Of course <a href="http://en.wikipedia.org/wiki/Chesley_Sullenberger">Sully</a> has a unicorn!  (He also poops nanchucks.)</p>
<p>It&#8217;s easy to find out what kind of pets users have one at a time.  Let&#8217;s do it in one big swoop though:</p>
<pre>
select u.username,
       p.name
  from user_pet up,
       pet p,
       user u
 where up.user_id = u.id
   and p.id = up.pet_id;

+----------+------------+
| username | name       |
+----------+------------+
| Maggie   | cat        |
| Maggie   | parakeet   |
| Maggie   | guinea pig |
| Sully    | unicorn    |
| Sully    | ferret     |
+----------+------------+
</pre>
<p>You get all the data you wanted, however, it&#8217;s all spread over many rows; some data aggregation is required!  You&#8217;re probably thinking to yourself: &#8220;Oh, man, if only there were a function that works just like SUM() but for strings!&#8221;.  If you&#8217;re using MySQL, you&#8217;re in luck thanks to the awesome <a href="http://dev.mysql.com/doc/refman/4.1/en/group-by-functions.html#function_group-concat">GROUP_CONCAT()</a> function.  If you use it, you&#8217;ll get this:</p>
<pre>
select u.username, group_concat(p.name)
  from user_pet up,
       pet p,
       user u
 where up.user_id = u.id
   and p.id = up.pet_id
 group by u.username;

+----------+-------------------------+
| username | group_concat(p.name)    |
+----------+-------------------------+
| Maggie   | cat,guinea pig,parakeet |
| Sully    | ferret,unicorn          |
+----------+-------------------------+
</pre>
<p>Pretty swanky, eh?  However, while group_concat() is totally awesome (oh, I love it so!), it&#8217;s only available in MySQL.  (Although comments on <a href="http://db4free.blogspot.com/2006/01/hail-to-groupconcat.html">this group_concat() praising blog post</a> have an example of how to accomplish group_concat() in postgress, which you could also achieve in Oracle.)  But I digress.  Also, once you have the string representing all the pets, you&#8217;ll need to parse it in your application to get the pet names.  And most importantly, this data is denormalized in a way that makes it a little difficult to enforce data integrity once you modify the list.</p>
<p>If group_concat() didn&#8217;t exist, what else can you do?  Thinking of group_concat() as a sum() but for strings instead of numbers is an interesting approach.  Next step: what do you have available out there that would yield a sum for a combination of values that could then be reverse engineered to get that unique combination of values again?  That&#8217;s right, bitmasks!</p>
<p>How do you implement this in your user-pet scenario?  It&#8217;s easy: first, represent bits in your bitmask as decimals on which you can later do math.  Use a table to keep track of the bit-to-decimal translation for additional data integrity. Use those decimals as keys for your pets. </p>
<p>First, I create a table named power_of_two with the following values:</p>
<pre>
+----------+-------+
| exponent | power |
+----------+-------+
|        0 |     1 |
|        1 |     2 |
|        2 |     4 |
|        3 |     8 |
|        4 |    16 |
|        5 |    32 |
|        6 |    64 |
|        7 |   128 |
+----------+-------+
</pre>
<p>Then I use those values as IDs in the pet table:</p>
<pre>
+-----+------------+
| id  | name       |
+-----+------------+
|   1 | iguana     |
|   2 | cat        |
|   4 | dog        |
|   8 | lizard     |
|  16 | parakeet   |
|  32 | guinea pig |
|  64 | snake      |
| 128 | unicorn    |
| 256 | ferret     |
+-----+------------+
</pre>
<p>Note that I added an iguana as pet with id of 1 &#8211; this is 2 to the 0th power.  I needed to add a pet at this spot to account for the rightmost place in my bitmasks (otherwise I have to shift everything by 1, which can be confusing).</p>
<p>We have 9 possible pets.  If I have no pets, my bitmask will be 000000000 &#8211; or 0.  If I have the first and the third pet, my bitmask will be 000000101 &#8211; or 5.</p>
<p>Assuming these changes to my database, let&#8217;s see what kinds of pets Maggie and Sully have:</p>
<pre>
select u.username, sum(p.id)
  from user_pet up,
       pet p,
       user u
 where up.user_id = u.id
   and p.id = up.pet_id
 group by u.username;

+----------+-----------+
| username | sum(p.id) |
+----------+-----------+
| Maggie   |        50 |
| Sully    |       384 |
+----------+-----------+
</pre>
<p>50 = 0b110010 &#8211; so Maggie has the 2nd, 4th and 5th pet. 384 = 0b110000000, so Sully has the 8th and the 9th pets.</p>
<p>Let&#8217;s assume your application heavily caches data that doesn&#8217;t change often &#8211; such as the pet table (not user_pet).  Let&#8217;s say you have this cached:</p>
<pre>
$pets = array(
    1 => 'iguana',
    2 => 'cat',
    4 => 'dog',
    8 => 'lizard',
    16 => 'parakeet',
    32 => 'guinea pig',
    64 => 'snake',
    128 => 'unicorn',
    256 => 'ferret'
);
</pre>
<p>If your application already knows this:</p>
<p>Maggie: 50<br />
Sully: 384</p>
<p>Then with a clever use of PHP&#8217;s <a href="http://us.php.net/decbin">decbin()</a>, it should be really easy to display the right pet names.</p>
<p>This approach is great for when you have a very limited number of times you can connect to the database and once you&#8217;re there, you don&#8217;t have the luxury of running expensive aggregate queries.  Also, web servers are easier to scale than coming up with error-proof database scaling solutions.  I&#8217;ve also found this approach extremely useful when migrating lots of data about users (think millions of rows) from one system to another.  Additionally, by having the power_of_two table, you&#8217;re able to be somewhat strict about the possible values for the user_pet table &#8211; this gives you slightly more data integrity than just using the concatenated string of names.</p>
<p>But the best part about using bitmasks is that you can do math on them!  What&#8217;s easier to compare:</p>
<p>iguana,cat,dog,guinea pig,unicorn,ferret<br />
to<br />
iguana,cat,guinea pig,snake,unicorn,ferret</p>
<p>Here you have to possibly split the string on the comma into an array, then do array comparisons.</p>
<p>OR</p>
<p>110100111<br />
111100011</p>
<p>Here you can use a bitwise AND to figure out the overlap!</p>
<p>Better yet, you can easily give people more pets!</p>
<p>110100111<br />
OR<br />
111111111</p>
<p>And now that person has all pets.  And you didn&#8217;t even have to check if they had them before!  Hooray for bitmasks!</p>
<p>Remember, as with many other optimization techniques, this one will not be appropriate for every scenario, but when you do need it, I promise it will work very very well!</p>
<p>P.S. I found it ironic to talk about a function that MySQL has but not Oracle &#8211; it&#8217;s usually the other way around.  The str_agg (string aggregate) function I&#8217;ve had to write is so clunky &#8211; Oracle, please implement group_concat()!</p>
]]></content:encoded>
			<wfw:commentRss>http://maggienelson.com/2009/02/denormalization-with-bitmasks/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
	</channel>
</rss>
