Maggie Nelson

databases and code goodness

  • Author: maggie
  • Published: Jul 14th, 2009
  • Category: entry
  • Comments: 5

More distributed key/value storage options

Tags: , , , , , , ,

CouchDB has infected me and I’ve been reading a lot about alternative ways to store data AND organize it. In the midst of options for alternatives to relational databases, these two stand out:

Cassandra – “Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.”

The huge appeal of Cassandra seems to be the approach to make it highly fault-tolerant. Writes never fail. Data is always available. No single point of failure. If you’re making a Twitter-like app, you should consider it.

Tokyo Cabinet – “Tokyo Cabinet uses hash algorithm to retrieve records. If a bucket array has sufficient number of elements, the time complexity of retrieval is “O(1)”. That is, time required for retrieving a record is constant, regardless of the scale of a database. It is also the same about storing and deleting. Collision of hash values is managed by separate chaining. Data structure of the chains is binary search tree. Even if a bucket array has unusually scarce elements, the time complexity of retrieval is “O(log n)”.”

Tokyo Cabinet is slightly newer and is apparently stupidly fast, faster than any other storage solutions out there (at least for now). It’s written in C and provided as API of C, Perl, Ruby, Java, and Lua.

Do you know anyone who has used these two already? Care to share your experiences?

  • Author: maggie
  • Published: Jul 13th, 2009
  • Category: entry
  • Comments: 1

In the Cloud, Fanatically.

Tags: , , , ,

As you may remember from my excited Twitter posts, I spoke on a panel organized by Rackspace at the New York Stock Exchange on June 17th. The audience of the panel was mostly composed of representatives from various NYC agencies that, basically, build cool websites for clients (Yours Truly’s employer included).

The panel topic was loosely defined and the conversation tended gravitate toward Twitter and Twitter-like applications. Why Twitter? It gained huge popularity, resulting in performance problems, often leading to the surfacing of the beloved Fail Whale. It also attracts some of the biggest buzzwords of Web 2.0: social networks, information architecture, folksonomy (hash tag anyone?). In the end, Twitter has piles and piles of data – a lot of it is noise, but there is a method to the madness. The panel talked about the possibilities other applications like it mean to the Internet and to future people-oriented business.

Among others, I shared the panel with Jonathan Bryce, the co-founder of Mosso.com, now rebranded to The Rackspace Cloud and Robert Scoble, a Twitterer and FriendFeeder Extraordinaire!

The Rackspace team took a video of the panel, so check it out! (Warning: pretty big .mov file.) I already <3'ed Rackspace, as many of you know, but just in case, big props to Adrianna Bustamante and her team for the efficient nerd-wrangling! (Although I still got to mention robot overlords in the panel!)

nyse_framegrab

Oh, and as for the question: “How do we make money from Twitter?” The answer is “Nobody knows, and even if they did, they sure aren’t going to tell you for free!”

  • Author: maggie
  • Published: May 21st, 2009
  • Category: entry
  • Comments: 20

ORM in the PHP World

Tags: , , , ,

Yesterday I gave a talk at the php|tek 2009 Conference about the ORM in the PHP World. In the first part of the presentation, I’m focusing on what an ORM is, what would make a great ORM, design patters for ORM and tying ORM systems to the PHP world in terms of philosophy, uses and approaches. The second part of the presentation talks about a list of ORMs that I have seen and their pros and cons.

The ORMs I mention:

I plan on talking about each of these ORMs in detail in separate blog posts, so stay tuned!

  • Author: maggie
  • Published: Apr 23rd, 2009
  • Category: entry
  • Comments: 1

Weekend Reading: Fun with Data and Statistics

Tags: , , ,

I know, I know, it’s only Thursday, but a girl can dream, right?

At work, I design a lot of database systems that manage a lot of data. Most of these systems are put in front of real human beings who are expected to find meaningful data in a big big pile of it. The two main approaches are to use either a harsh, editorial-driven, curated system such as a category hierarchy (Rock falls under Music falls under Entertainment) or have a completely free-flowing, user-generated system such as tagging or description search. But in either case, there’s always something missing – you pick tagging, you wish people didn’t tag things with “boobies” so much. You pick a strict category structure and it just feels too restrictive. So what can you do? The March/April 2009 issue of IEEE Intelligent Systems Magazine has an article Unreasonable Effectiveness of Data.

We should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.

The article broke my brain a little bit, but go read it, it’s interesting nevertheless.

While we’re talking about representations of data, go read about the Semantic Web – how can we tell computers and teh internets what we humans want?

If you want a little bit lighter reading, go read Bill Bryson’s books about language, specifically Mother Tongue and Made In America. Reading anything by Bill Bryson will make you a better person (or your money back).

Once you have your data, someone will inevitably ask to tell them what’s “popular”. I’m putting it in quotes, because it means so many things to so many people. Before you answer, learn a little bit about statistics. I recommend Statistics in a Nutshell from O’Reilly. Hint: “most popular” does not always mean “has most views”.

For some real-life scenarios of statistics, misuse of statistics, problems with polling plus a nice dose of politics, read Nate Silver’s FiveThirtyEight.com blog. He’s also a partner and analyst for Baseball Prospectus – you might fight baseball boring, but boy, does it lend itself toward awesome stats gathering and mangling. Reading the two might not be immediately applicable to software developers, but it’ll put your mind in a right context when trying to get meaning out of your giant pile of data.

I will expect your book reports by Monday.

  • Author: maggie
  • Published: Apr 20th, 2009
  • Category: entry
  • Comments: 3

Oracle Buys Sun

Tags: , ,

Oracle buys Sun. Sun owns MySQL. Oracle’s press release about it here.

Does this mean an end to MySQL? Are all companies using MySQL will now get really harsh sales pitches to use Oracle? I love Oracle and I love MySQL – the beauty of both is competition. “MySQL is better than Oracle because…” and “Oracle is better than MySQL because…” are what drives both to get better.

Good for Oracle and Sun, but I’m curious to see what happens.

On the upside, Chris and Lig will now be coworkers.

© 2010 Maggie Nelson. All Rights Reserved.

This blog is powered by the Wordpress platform and beach rentals.