Weekend Reading: Fun with Data and Statistics
by maggie on Apr.23, 2009, under entry
I know, I know, it’s only Thursday, but a girl can dream, right?
At work, I design a lot of database systems that manage a lot of data. Most of these systems are put in front of real human beings who are expected to find meaningful data in a big big pile of it. The two main approaches are to use either a harsh, editorial-driven, curated system such as a category hierarchy (Rock falls under Music falls under Entertainment) or have a completely free-flowing, user-generated system such as tagging or description search. But in either case, there’s always something missing – you pick tagging, you wish people didn’t tag things with “boobies” so much. You pick a strict category structure and it just feels too restrictive. So what can you do? The March/April 2009 issue of IEEE Intelligent Systems Magazine has an article Unreasonable Effectiveness of Data.
We should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.
The article broke my brain a little bit, but go read it, it’s interesting nevertheless.
While we’re talking about representations of data, go read about the Semantic Web – how can we tell computers and teh internets what we humans want?
If you want a little bit lighter reading, go read Bill Bryson’s books about language, specifically Mother Tongue and Made In America. Reading anything by Bill Bryson will make you a better person (or your money back).
Once you have your data, someone will inevitably ask to tell them what’s “popular”. I’m putting it in quotes, because it means so many things to so many people. Before you answer, learn a little bit about statistics. I recommend Statistics in a Nutshell from O’Reilly. Hint: “most popular” does not always mean “has most views”.
For some real-life scenarios of statistics, misuse of statistics, problems with polling plus a nice dose of politics, read Nate Silver’s FiveThirtyEight.com blog. He’s also a partner and analyst for Baseball Prospectus – you might fight baseball boring, but boy, does it lend itself toward awesome stats gathering and mangling. Reading the two might not be immediately applicable to software developers, but it’ll put your mind in a right context when trying to get meaning out of your giant pile of data.
I will expect your book reports by Monday.
Oracle Buys Sun
by maggie on Apr.20, 2009, under entry
Oracle buys Sun. Sun owns MySQL. Oracle’s press release about it here.
Does this mean an end to MySQL? Are all companies using MySQL will now get really harsh sales pitches to use Oracle? I love Oracle and I love MySQL – the beauty of both is competition. “MySQL is better than Oracle because…” and “Oracle is better than MySQL because…” are what drives both to get better.
Good for Oracle and Sun, but I’m curious to see what happens.
The Example Conundrum
by maggie on Mar.24, 2009, under entry
While working on my presentation about ORMs in PHP for the upcoming php|tek conference in May, I’m finding it slightly challenging to pick a good example of data represented in the application and in the database, so that I could insert various ORMs in between these two ends.
The example should be flexible enough to fit within different ORMs. Think of a scale from ActiveRecord to something really complicated like Java’s Hibernate – that’s the scale for the ORMs. It also needs to be really intuitive, so that the audience of the talk can understand it without much explanation – I want to focus on the ORMs themselves, not something else.
At this point, I’m leaning toward an example representing a user on a social network site and his friends. Something along these lines (don’t beat me up, it’s a very rough draft):
object model:
database model:
I think it’s a good idea to keep this example specific to functionality common in web applications (the alternative is kittens, really). On the other hand, I don’t want the example to be overly complex, so that the audience focuses more on the topic of the presentation rather than the details of the example.
When you come to presentations, what do you look for in examples? Are you looking for something you can basically copy and paste and use in your code? Or are you more interested in high-level theory where examples take a back seat and are there just for flavor but not really to explain?
All the Reference You Need…
by maggie on Mar.10, 2009, under entry
People borrow my database books and a bunch of them are in circulation at any given time. Also, a lot of database resources are online only (e.g. for Oracle and MySQL). Right now, this is the state of my database bookshelf:

l2db
Not bad! You can probably get through a lot with just these three…
Optimization Woes
by maggie on Mar.10, 2009, under entry
I think by now everyone’s familiar with the phrase:
“Premature optimization is the root of all evil.”
In my day-to-day job, I worry a lot about writing high-performance applications, especially in the way applications manage and retrieve data. There is a huge difference between designing your application to be optimized for a specific purpose and premature optimization. It’s one thing to worry whether echo or print is faster (who cares, really?) – but it’s another to design an application to tolerate some slowness in low-traffic parts but gaining ultra fast response time on the first access to the application. For example, viewing your photo album may take a while but seeing the homepage is pretty freakin’ fast!
In a recent post, Sebastian Bergmann points out that you should not micro-optimize (in reference to a PHP micro-optimization tricks post by Alex Netkachov). I agree wholeheartedly. It’s a nice list and definitely something to keep in mind, but it should be just a small component of your developer bag of tricks! Before you can optimize, you have to design well first. Proper design leads to potential for great optimization. And to design well, you have to really understand the problem you’re trying to solve. Sebastian links to slides by Ilia Alshanetky from PHP Quebec 2009 on Common Optimization Mistakes. Ilia also points out that it’s important to understand before you start fixing:
Solve the business case before optimizing the solution.
For PHP Advent, I wrote an article Optimize This!, which talks about how to optimize an existing application. The most important part to focus on is understand what the application is doing and then figuring out where you’ll have most impact. Sure, you can use echo over print (or print over echo, once again, who cares?) to get that 0.0001% performance boost, but if you’re doing 140 live SQL queries on your homepage, your application will still have 3 minute response time and with more users, the performance will continue to degrade until the application becomes unusable.
Modularizing code (aka “Object-Oriented Programming”) makes it easy to focus on problem areas in an existing application. Designing well doesn’t always mean designing for high performance. Instead, it means designing for flexibility and atomicity – you want to be able to easily change things without huge impact on the overall system. So before you go an optimize, take a step back and think about what your application is doing and what it’s supposed to do. Go write on the white board – diagrams prove very useful in figuring out flows and functionality! Go chat in #phpc channel on irc.freenode.net – the PHP community will help (with a little bit of playful making fun of, but we’ll help). Just don’t go changing all your methods to static methods unless you’re really sure it’s the right thing to do!

