Web 2.0 and the relational database

Yes, this is yet another rant about how people incorrectly dismiss state-of-art databases. (Famous people have done it, why shouldn’t I?) It’s amazing how much the Web 2.0 crowd abhors relational databases. Some people have declared real SQL-based databases dead, while some have proclaimed them to be as not cool any more. Amazon’s SimpleDB, Google’s BigTable and Apache’s CouchDB are trendy, bloggable ideas that to be honest, are ideal for very specific, specialized scenarios. Most of the other use cases, and that comprises 95 out of a 100 web startups can do just fine with a memcached + Postgres setup, but there seems to be a constant attitude of “nooooo if we don’t write our code like google they will never buy us…!” that just doesn’t seem to go away, spreading like a malignant cancer throughout the web development community. The constant argument is “scaling to thousands of machines”, and “machines are cheap”. What about the argument “I just spent an entire day implementing the equivalent of a join and group by using my glorified key-value-pair library”? And what about the mantra “smaller code that does more”?

Jon Holland (who shares his name with the father of genetic algorithms) performs a simple analysis which points out a probable cause: People are just too stupid to properly use declarative query languages, and hence would rather roll their own reinvention of the data management wheel, congratulating themselves on having solved the “scaling” problem because their code is ten times simpler. It’s also a hundred times less useful, but that fact is quickly shoved under the rug.

It’s not that all Web-related / Open Source code is terrible. If you look at Drupal code, you’ll notice the amount of sane coding that goes on inside the system. JOINs used where needed, caching / throttling assumed as part of core, and the schema allows for flexibility to do fun stuff. (Not to say I don’t have a bone to pick with Drupal core devs; the whole “views” and “workflow” ideas are soon going to snowball into the reinvention of Postgres’s ADTs; all written in PHP running on top of a database layer abstracted Postgres setup.)

If Drupal can do this, why can’t everyone else? Dear Web 2.0, I have a humble request. Pick up the Cow book if you have access to a library, or attend a database course in your school. I don’t care if you use an RDBMS after that, but at least you’ll reinvent the whole thing in a proper way.

This is Huge

Amazon Mechanical Turk — “Artificial Artificial Intelligence”:

Amazon Mechanical Turk provides a web services API for computers to integrate “artificial, artificial intelligence” directly into their processing by making requests of humans. Developers use the Amazon Mechanical Turk web services API to submit tasks to the Amazon Mechanical Turk web site, approve completed tasks, and incorporate the answers into their software applications. To the application, the transaction looks very much like any remote procedure call: the application sends the request, and the service returns the results. In reality, a network of humans fuels this artificial, artificial intelligence by coming to the web site, searching for and completing tasks, and receiving payment for their work.

It’s insanely ambitious, but I applaud the Amazon guys for coming up with something like this.

It's really about the roses you forgot to stop and smell

I’ve been following all this Web 2.0 business for a while, and it scares me. It’s hard to explain in a few lines — (I really want to, but I also have to evaluate the charniak parser, and the brill tagger, write appositions…. ) – it’s not because we’re heading towards a second dot-com bubble(we might be, but I don’t really care about it). It’s because there’s WAY too much technology out and about. There’s not enough people(or people’s resources) to consume it, and not enough data for it to be useful. The searchable internet may have billions of pages, but how much of it is really useful? Your intelligent social network may connect you to so many people, but do you really want to talk to all these people? You may relish the ability to suck in So Much Information with the press of a single button, but how much of the data you consume useful? I’m not just complaining about the fact that the information age has retarded our lives instead of making it better; I’m worrying about the fact that we’re heading towards an overprocessed, overnetworked, overmanaged world where we’re doing very little useful work.

My other worry is about the frontronners of the new Web: Microsoft, Google, Yahoo, Amazon, News Corp. Each one is building their own Map framework, their own index of the world and its libraries, their own social network systems, their own information monarchy. It’s all about “convergent, ubiquituous, live-your-life-on-my-website technology”. Imagine the redundancy in effort; in intellectual advancement; the waste of precious human capability because each of these players(and countless other startups, opensource mashups, and random developers) are eying the same piece of meat: the whole of your life. Not just a part of it: the only way to really strike a profit is to make it really useful for you. And the only way to make this really useful, is to take over the whole of your life. What you read, hear, see. Who you communicate with and how. Which parties you go to, What you eat, where you go to shop. And how you travel to get to party, shop and eat. It’s ironic, but from how I look at it, the global optimal(both my convenience, and their profit) is for everyone to surrender entirely to one of these Big Brothers. I’m not advocating an eventual 1984 here, just pointing out that the only way we will really ever get to the Web we dream of is by letting exaclty one of these players to win; and that situation is identical to the BigBrotherness we all have nightmares about.

(I know a lot of this text above would look like mindless drivel unless you’ve been carefully following the way the WWW has been changing over the last year. I wish had the time to hyperlink, exemplify and write this out clearly; but I’m afraid I’m going to have to sacrifice quality of writing due to lack of time.)