php

Visualizations for Navigation : Experiments on my blog

This is a meta post describing two features on this blog that I don’t think I’ve documented before. Apologies for the navel-gazing, I hope there’s enough useful information here to make it worth reading

Most folks read my blog through the RSS feed, but those who peruse the web version get to see many different forms of navigational aids to help the user around the website. Since the blog runs on Drupal , I get to deploy all sorts of fun stuff. One example is the Similar Entries module, that uses MySQL’s FULLTEXT similarity to show possibly related posts1. This allows you to jump around on the website reading posts similar to each other, which is especially useful for readers who come in from a search engine result page. For example, they may come in looking for Magic Bus for the iPhone , but given that they’re probable iPhone users, they may be interested in the amusing DIY iPhone Speakers post.

The Timeline Footer

However, given that this blog has amassed about a thousand posts over seven years now, it becomes hard to expose an “overview” of that much information to the reader in a concise manner. Serendipitous browsing can only go so far. Since this is a personal blog, it is interesting to appreciate the chronological aspect of posts. Many blogs have a “calendar archive” to do this, but somehow I find them unappealing; they occupy too much screen space for the amount of information they deliver. My answer to this is a chronological histogram, which shows the frequency of posts over time:

Each bar represents the number of blog posts I posted that month, starting from August 2002 until now2. Moving your mouse over each bar tells you which month it is. This visualization presents many interesting bits of information. On a personal note, it clearly represents many stages of my life. June of 2005 was a great month for my blog — it had the highest number of posts, possibly related to the fact that I had just moved to Bangalore, a city with and active Blogging community. There are noticeable dips that reflect extended periods of travel and bigger projects.

In the background, this is all done by a simple SELECT COUNT(*) FROM nodes GROUP BY month type query. Some smoothing is applied to the counts due to the high variance, for my usage, Height = Log base 4 (frequency) gave me pretty good results. This goes into a PHP block, which is then displayed at the footer of every blog page. The Drupal PHP snippets section is a great place to start to do things like this. Note that the chart is pure HTML / CSS; there is no Javascript involved3.

The Dot Header

Many of my posts are manually categorized using Drupal’s excellent taxonomy system. A traditional solution to this is to create sections, so that the user can easily browse through all my Poems or my nerdy posts. The problem is that this blog contains notes and links to things that I think are “interesting”, a classification that has constantly evolved as my interests have changed over the past decade. Not only is it hard for me to box myself into a fixed set of categories, maintaining the evolution of these categories across 7+ years is not something I want to deal with every day.

This is where tags and automatic term extraction come in. As you can see in the top footer of the blog mainpage , each dot is a topic, automatically extracted from all posts on the website. I list the top 60 topics in alphabetical order, where each topic is also a valid taxonomy term. The aesthetics are inspired by the RaphaelJS dots demo, but just like the previous visualization, it is done using pure CSS + HTML. The size and color of the dot is based on the number of items that contain that term. Hovering over each dot gives you the label and count for that dot, clicking them takes you to an index of posts with that term. This gives me a concise and maintainable way to tell the user what kinds of things I write about. It also addresses a problem that a lot of my readers have — they either care only about the tech-related posts (click on the biggest purple dot!), or only about the non-tech posts (look for the “poetry” dot in the last row!).

This visualization works by first automatically extracting terms from each post. This is done using the OpenCalais module (I used to previously use Yahoo’s Term Extractor, but switched since it seems Yahoo!‘s extractor is scheduled to be decommissioned soon). The visualization is updated constantly using a cached GROUP BY block similar to the previous visualization, this time grouped on the taxnomy term. This lets me add new posts as often as I like, tags are automatically generated and are reflected in the visualization without me having to do anything.

So that’s it, two simple graphical ways to represent content. I know that the two visualizations aren’t the best thing since sliced bread and probably wont solve World Peace, but it’s an attempt to encourage discoverability of content on the site. Comments are welcome!


Footnotes:

1 I actually created that module (and the CAPTCHA module) over four years ago; they’ve been maintained and overhauled by other good folks since.

2 Arnab’s World is older than that (possibly 1997 — hence the childish name!), but that’s the oldest blog post I could recover.

3 I have nothing against Javascript, it’s just that CSS tends to be easier to manage and usually more responsive. Also, the HTML generated is probably not valid and is SUPER inefficient + ugly. Hopefully I will have time to clean this up sometime in the future.

y!Vmail - voice mail for your Yahoo! Mail

Yesterday Dan, Pradeep and I presented “y!Vmail: voicemail for your Yahoo! Mail” at the Yahoo! University Hack Day Contest, winning the award for the 2nd best Hack! (jump to the demo video )


Our team with judges Paul Tarjan and Rasmus Lerdorf

The adventure started when I heard about Yahoo!‘s Hack U event:

Join Yahoo! web experts including Rasmus Lerdorf, the creator of PHP, for a week of learning, hacking and fun! You’ll hear interesting tech talks, hacking tips and lessons, and get hands-on coding workshops where you’ll work with cutting-edge technology. The week’s events will culminate with our University Hack Day competition—a day-long festival of coding, camaraderie, demos, awards, food, music and jollity (it’s a real word, look it up).

Years ago when I was in my teens, I was an avid participant on the school / college tech fest circuit. Almost every major institution in and around Delhi would organize annual technical festivals, hosting programming contests and software demo competitions. This was where I got a chance to showcase my creations and meet other hackers. Winning these events became a good way for me to pay off those telephone bills — web development in the dial-up age was an expensive hobby!

I decided to enter the Hack Day contest just for fun; it had been a while since I participated in one of these. It wasn’t about winning this time; I just wanted to do the whole “idea to execution to demo” thing with a group of friends, and spend hours screaming at each other over STUPID hard-to-find bugs that are actually staring at you in the face, high-fiving every hour as a feature milestone was scratched off the todo-list. The reward: to be able to stand in front of a group of people and say “Hey guys, look what I made!.” (If it’s hard to appreciate what this feels like, this video might help.)


Yahoo! gave away a bunch of t-shirts, this was on one of them

3 days before the Hack Day, I had an idea about building a phone-based interface for email. The idea was simple enough to build in a day, but fun enough to make an enjoyable demo. The only problem: I was already in the midst of a “hack” daymonth of my own; VLDB was due 3 hours before the start of the Hack Day, and I was already sacrificing sleep for LaTeX and Python for more than a week. There was no way I was going to be able to do this alone. Enter fellow grad students Dan and Pradeep. I told them about the contest and my idea. While they are both expert hackers, I totally forgot about the fact that people in Operating Systems research don’t really do a lot of Web Programming: “PHP....? I’ve never…” said Dan. I pointed them to the Yahoo Developer Network site and returned to my research paper writing madness. Hopefully by Friday evening, I would have a web-savvy hack team.

On Friday, I took a quick nap after my paper deadline, and walked over to the Hack Fest area to meet my team (who had become PHP and telephony wizards by now) and load up on caffeine and sugar that the Yahoo! folks had set up for us.


They even had my favorite candy !

We split the work into two parts; Dan would build the phone interface while Pradeep and I would figure out the email and contacts API to write an email client backend. 7 hours later, we had the first version of our product up and running. We could call in and read emails. Happy with our progress, we decided that it would be wiser to go home and show up early next day. We ended up wasting a few hours the next morning worrying about the presentation: the lecture hall had spotty cellphone coverage, a deal-killer for a phone demo! Pradeep made a breakthrough here, discovering that an obscure panel on the wall was actually a secret speakerphone. Having resolved demo issues, we resumed coding and plugged in the remaining features: navigating through emails, email summarization, and email prioritization. The friendly timestamps feature (“4 minutes ago”) was stolen from my blog’s code (i.e. the Status header of this blog).

Around 3:30pm on Saturday, we updated our hackday entry:

y!Vmail

by Arnab Nandi, Daniel Peek, Pradeep Padala

“Not everyone has a computer, but everyone has a phone.”

This hack allows people to access their Yahoo! mail through a 1-800 number, using ANY touch-tone phone.
Press 0 to open, * and # to navigate, 7 to delete. We figure out which emails are important, and read them first. We summarize long emails so that you dont have to listen to all of it. If you want to talk to the person, just press 5 — we’ll connect you.

APIs used: BBAuth, OpenMail, Contacts API, Term Extraction API

Hack presentations started at 4:00pm on Saturday. I started with a 20-second powerpoint pitch, followed by a rather entertaining demo. Using the lecture hall’s speakerphone we had the lecture hall call our service. Entering the correct PIN logged me in, which resulted in an entire roomful of people were now hearing the words “Welcome to y!Vmail. You have 5 new emails…”


Me pushing numbers on the phone


Here’s a short video walk through of our app:

More details at http://yvmail.info

A few minutes after the presentation ended, the prizes were announced. We ranked second. The winning hack was Brandon Kwaselow’s “Points of WOE”; a native iPhone app that allowed browsing and creation of placemarks on Yahoo! Maps. Congratulations, Brandon!

Overall, this was a very exciting and enjoyable event; I had a rocking good time hanging out with the Yahoo! folks and getting a cool project out the door with around 15 hours of work. I end with some lessons, acquired over years of doing demo contests:

  • Be creative, but avoid feature creep.
  • Split up into sub-teams, but make sure you’re pair programming most of the time.
  • Get Version 0 done Super Super Early. Then polish, polish, polish.
  • Reuse (with attribution) as much code as you can.
  • Take lots of breaks, make friends, and have fun.

Image credits: Rasmus, Erik
Shout outs: Folks at Twilio for making the coolest telephony API in the universe!

Getting django-auth-openid to work with Google Accounts

update: This blog post is meant for older versions of django-authopenid. The latest version available at pypi has implemented a fix similar to this one, and hence works out of the box, you wont need this fix.
Thanks to Mike Huynh for pointing this out!

I've been playing with Django over the past few days, and it's been an interesting ride. For a person who really likes PHP's shared- nothing, file-based system model (I'm mostly a drupal guy), Django comes across as overengineered at first, but I'm beginning to see why it's done that way.

I was trying to get single-signon working, and settled on django-authopenid over the other django openid libraries, django-openid, django-openid-auth and django-oauth. It was easy to use and understand, and wasn't seven million lines of code.

My intention was to use the OpenID extension to get the user's email address during the sign on process. However, it doesn't seem to work with Google's OpenID implementation, because Google uses the an Attribute Exchange (ax) extension instead of the Simple Registration (sreg) OpenID extension that is implemented in the library. A quick hack to django-authopenid's views.py makes it work:


51c51
- from openid.extensions import sreg
---
+ from openid.extensions import ax
94c82
- sreg_request=None):
---
+ ext_request=None):
113,114c101,102
- if sreg_request:
- auth_request.addExtension(sreg_request)
---
+ if ext_request:
+ auth_request.addExtension(ext_request)
195,210c172,185
- sreg_req = sreg.SRegRequest(optional=['nickname', 'email'])
- redirect_to = "%s%s?%s" % (
- get_url_host(request),
- reverse('user_complete_signin'),
- urllib.urlencode({'next':next})
- )
-
- return ask_openid(request,
- form_signin.cleaned_data['openid_url'],
- redirect_to,
- on_failure=signin_failure,
- sreg_request=sreg_req)
---
+ ax_req = ax.FetchRequest()
+ ax_req.add(ax.AttrInfo('http://schema.openid.net/contact/email', alias='email',required=True))
+ redirect_to = "%s%s?%s" % (
+ get_url_host(request),
+ reverse('user_complete_signin'),
+ urllib.urlencode({'next':next})
+ )
+
+ return ask_openid(request,
+ form_signin.cleaned_data['openid_url'],
+ redirect_to,
+ on_failure=signin_failure,
+ ext_request=ax_req)

Obviously this is a very cursory edit. I'm too lazy to improve and submit this as a patch, so readers are encouraged to submit it to all relevant projects!

|

multisite drupal: the importance of the sequence

Recent versions of Drupal have the oh-so-cool feature that allows you to host many websites off a single Drupal codebase. The coolest part about this is that you can share some tables accross multiple websites; which means you can do things like have a single username/password table accross all the websites. This can easily be done, as specified in the settings.php comments as:

* $db_prefix = array( * 'default' => 'main_', * 'users' => 'shared_', * 'sessions' => 'shared_', * 'role' => 'shared_', * 'authmap' => 'shared_', * 'sequences' => 'shared_', * );

Now here’s an important thing to note: The first table you have to share is the sequences table. This is the table that handles all the id counters, so if you don’t share this one, something like this can happen:

[you shared only the users table]

1. User 1 signs up on Site A, gets user id#1
2. User 2 signs up on Site A, gets user id#2
3. User 3 signs up on Site B, gets user id#....? The correct answer is not 3!

This happens because you didn’t share sequencesSite B uses it’s own sequence generator to render a duplicate userid… which the user table would not accept, and this would go on till the Site B sequence catches up with the Site A sequence, and then things would be normal. The code quality in user.module helps protect the user table from data corruption, but you will have many signups disappear into thin air with a set up like this. Hence, all you need to do is share the sequences table along with the users… and you’re all set!

Btw, hello Planet people!

|

Drupal downtime

It’s a freak coincidence, Drupal, Drupaldocs and Drupaldevs are all down right now. Note that Drupal.org is down for scheduled maintenance, and has not been hacked or taken down or anything*. They’re changing the powerlines at the server room or something, and it’s just taking a little longer than expected.

Drupal versions <= 4.6.1 (and a bunch of other PHP apps) have a security problem which makes them vulnerable to code injection, which means bad people can do bad people to your website. To solve this, all you need to do is go to drupal.org and download the latest patched version. Since the site is down at the moment, here’s a temporary fix:

1. delete xmlrpc.inc in the includes directory.
2. upload a blank file in its place.

This should keep you safe from attacks, but will disable the weblogs.com, etc. “ping” notification, and the blogapi. You can later update the files when drupal.org is back up.

|

Writing a simple guestbook script using DBX

Personal Homepages often contain a small section called a guestbook - a place where people can come and write stuff about how they liked the website, or just to let the person know that they've visited the page. And mostly, these pages are the ones with very less traffic.

| |

Tag Soup

Here's a List of all the tags(categories, labels, whatever you call them) used at arnab.org:

captchas and racism

From #drupal:
arnab: and the fact that captchas are, well, stupid
chx: yes, visual captchas are stupid
chx: I think the textual ones are better
chx: if you REALLY want some captcha then something textual
UnConeD: Welcome to my site! To register, please answer the following captcha!
UnConeD: What is the 312455th digit of Pi, in base 42?
chx: UnConeD: LOL
arnab: heh
arnab: exactly my point.
chx: rather “what is the eleventh letter in this sentence?”
UnConeD: well
UnConeD: that sort of stuff is easily cracked with regexps
UnConeD: some guy once made a math expression captcha in text form
UnConeD: in a patch to the module
chx: yes yes
UnConeD: i followed up the issue with a PHP script to break his code ;)
chx: I liked that one
chx: Well I think I can rather easily make a textual captcha you won’t be able to script
arnab: chx: make one, I’ll crack it :D
UnConeD: you are no match for my dangerous RegExping skills
***UnConeD casts Capturing Parentheses (opponent’s movement reduced by 50%)
chx: UnConeD: beware, I’ll grep the CIA World Facts book and ask questions based on that and you can eat your regexps.
UnConeD: err
UnConeD: but grep is itself regexp based :P
chx: I mean, I’ll compile a huge list of facts based on World Facts wikipedia whatever
arnab: chx: I have an indexed, parsed dump of Wikipedia on my HDD, will break your CIA thingy in 2 minutes with it
chx: and questions like “Is Ghana in Africa?”
UnConeD: i’ll hire an indian fellow with an encyclopedia
arnab: UnConeD: I AM an Indian fellow with an encyclopedia
arnab: rofl
UnConeD: ;)

hmm

MSNSearch tells the world that I'm the seventh most important singlegirl around:

7. arnab's world :: weblog
... Sex&SingleGirl. I am a neurotic sex goddess ...
arnab.org/blog

Others disagree. Talk about a twisted sense of perception.

| |

contest

PHP Programming Marathon:

The Marathon organizers will send a PHP problem via dotGeek messenger (powered by Jabber) which must be solved using both PHP and mySQL. The problem will be given at the below mentioned date and the registered participants will have to submit the solution within 24 hours via a submission form. The solutions will be analyzed by the Jury and, depending on the number of participants , the winners will be announced in maximum one week.

It's on November 22nd.

| |