Yahoo!

Reputation Misrepresentation, Trail Paranoia and other side effects of Liking the World

trafficspike

A few months ago, I wrote up some quick observations about Facebook’s then just-launched “Like” button, pitching “Newsfeed Spam” as a problem exacerbated by the new Like Buttons. The post went “viral”, so to speak, bouncing off Techmeme, ReadWriteWeb / NYTimes, even German news websites. Obviously this is nothing compared to “real” traffic on the Internet, but it was fun to watch the link spread. This is meant to be a follow-up to that post, based on thoughts I’ve had since.

In this post, I'll be writing about five "issues" with the Like button, followed by four "solutions" to these issues. Since this is a slightly long post, here's an outline:


Big Deal!


facebook stats

The Facebook Like Button has been huge success. With over 3 billion buttons served, and major players such as IMDB and CNN signing up to integrate the button (and other social plugins) into their websites, the chance of encountering a Facebook Like button while browsing on the web is quite high; if not certain. Many folks have questioned whether this is a big deal -- IFRAME and javascript based widgets have been around for a long time (shameless self-plug: Blogsnob used a javascript-based widget to cross polinate blogs across the internet as early as 8 years ago). Using the social concept of showing familiar faces to readers isn't new either; MyBlogLog has been doing it for a while. Then why is this silly little button such an issue? The answer is persistent user engagement. With 500 million users, out of which 50% of them log into Facebook at any given day, you're looking at an audience of 250 million users. If you're logged into Facebook while browsing any website with a social plugin, the logged in session is used. Now if you're like me, you'll probably have "remember me" checked at login, which means you're always logged into Facebook. What this means is that on any given day, Facebook has the opportunity to reach 250 million people throughout their web browsing experience; not just when they're on Facebook.com[1]. So clearly, from a company's perspective, this is important. It is a pretty big deal! But why is this something Facebook users need to be educated about? Onwards to the next section!

Issues with the Like Button


Readers should note the use of the word "Issues", as opposed to "Security vulnerability", "Privacy Leak", "Design Flaw", "Cruel Price of Technology", or "Horrible Transgression Against Humankind". Each issue has its own kind of impact on the user, you're welcome to decide which is which!

Screen shot 2010-07-21 at 1.37.51 AM

To better understand the issues with the Like button, let's understand what the Like button provides:
1) It provides a count of the number of people who currently "Like" something.
2) It provides a list of people you know who have liked said object, with profile pictures.
3) It provides the ability to click the button and instantaneously "Like" something, triggering an update on your newsfeed.
All of this is done using an embedded IFRAME -- a little Facebook page within the main page that displays the button.

In the next few paragraphs, we'll see some implications of this button on the web.

Reputation Misrepresentation


The concept of reputation misrepresentation is quite simple:
a not-so-popular website can use another website's reputation to make the site seem more reputed or established to the user.

Here's a quick diagram to explain it:

reputation misrepresentation

Simply put, as of now, any website(e.g. a web store) can claim they are popular (especially with your friends) to gain your trust. Since Facebook doesn't check referrer information, Facebook really doesn't have the power to do anything about this either. A possible solution is to include verifying information inside the like button, which ruins the simplicity of it all.

Browse Trail Inference


This one is a more paranoid concept, but I've noticed that people don't realize it until I spell it out for them:
Facebook is indirectly collecting your entire browsing history for all websites that have Facebook widgets. You don’t have to click any like buttons, just visiting sites like IMDB.com or CNN.com or BritneySpears.com will enable this.

Here's how it works:

browsetrail

Here, our favorite user Jane is logged into Facebook, and visits 2 pages on IMDB.com, checks the news on CNN, and then heads to Yelp to figure out where to eat. Interestingly enough, Facebook records all this information, and can tie it to her Facebook profile, and can thus come up with inferences like "Jane likes Romantic Movies, International News and Thai Food -- let's show her some ads for romantic getaways to Bali!"

(Even worse, if Jane unwittingly visits a nefarious website which coincidentally happens to have the Like button, Facebook gets to know about that too!)

Most modern browsers send the parent document's URL as HTTP_REFERER information to Facebook via the Like IFRAME, which allows Facebook to implicitly record a fraction of your browsing history. Since this information is much more voluminous than your explicit "Likes"; a lot more information can be data-mined from it; which can then be used for "Good"(i.e. adding value to Facebook) or "Evil"(i.e. Ads! Market data!)

What I like about this is that this is an ingenious system to track user's browsing behavior. Currently, companies like Google, Yahoo and Microsoft(Bing/Live/MSN) have to convince you to install a browser toolbar which has this minuscule clause in its agreement that you share back ALL your browsing history, which can be used to better understand the Web(and make more money, etc. etc.). Since Facebook is getting all websites to install this; it gets the job done without getting you to install a toolbar! I'll be discussing how I deal with this in the last section, "My solution".

Newsfeed Spam


In a previous post, I demonstrated how users could be tricked into "Liking" things they didn't intend to, leading to spam in their friends' newsfeeds. A month later, security firm Sophos reported an example of this, where users were virally tricked into spreading a trojan virus through Facebook Likes, something that could easily be initiated by Like buttons across the web, where you can easily be tricked into liking arbitrary things.

Again, this issue has the same root cause as Reputation Misrepresentation: since all the Like button shows you is a usercount, pictures and the button itself, there really is no way to know what you're liking. A solution to this is to use a bookmarklet in your browser, which is under your control.

"Likejacking"


This interesting demo by Eric Kerr demonstrates how to force unwitting users into clicking arbitrary like buttons. The way this works is by making a transparent like button, and make it move along with the users mouse cursor. Since the user is bound to click on the page at some point of time, they're bound to click the Like button instead.

Like Switching


likeswitch

Like switching is an alternative take on Like Jacking -- the difference is that the user is explicitly shown a like button with a prestigious like count and familiar friends first. When a user reaches out to click on it, the like button is swapped out for a different one, triggered by an onmouseover event from the rectangle around the button.

"Solutions"

Given these issues, let's discuss some solutions, responses and fixes. Note the use of quotes -- for many people can argue that nothing is broken, so we don't need solutions! Regardless, one piece of good news is that the W3C is aware of the extensive use of IFRAMES on the web, and has introduced a new "sandbox" attribute for IFRAMES. This will lead to more fine-grained control of social widgets. For example, if we can then set our browsers to force "sandbox" settings for all Facebook IFRAMES, we can avoid handing over our browsing history to Facebook.


Facebook's approach


While I don't expect companies to rationalize every design decision with their users, I am glad that some Facebook engineers are reaching out via online discussions. Clearly this is not representative of the whole company, but here's a snippet:
Also, in case it wasn't clear, as soon as we identify a domain or url to be bad, it's impossible to reach it via any click on facebook, so even if something becomes bad after people have liked it, we still retroactively protect users.

I like this approach because it fits in well with the rest of the security infrastructure that large companies have: the moment a URL is deemed insecure anywhere on the site, all future users are protected from that website. However, this approach doesn't solve problems with user trust -- it's relying on the fact that Facebook has flagged every evil website in the world before you chanced upon it -- something I wouldn't bet my peace of mind on. It's as if the police told you "We will pursue serial killers only after the first murder!"Would you sleep better knowing that? In essence, this approach is great when you're looking at it from the side of protecting 500 million users. But as one of the 500 million, it kinda leaves you out in the dark!


Secure Likes

As we mentioned in the Reputation Misrepresentation section, another interesting improvement would be to include some indication of the URL that is being "Liked" inside the button itself. An option is to display the URL as a tooltip when the user hovers his/her cursor over the button, especially if it disagrees with the parent frame's URL. Obviously placing the whole URL would make the button large and ugly. A possible compromise is to include the favicon(the icon that shows up for each site in your browser) right inside the Like button. The user can simply check if the browser icon is the same as the one on the like button to make sure it's safe. This way, if a website wants to (mis)use BritneySpears.com's Like Button, it will be forced to use BritneySpears.com's favicon too! Here's a mockup of what "Secure Like" would look like for IMDB:

securelike


A browser-based approach


Screen shot 2010-07-26 at 5.11.57 AM

This approach, best exemplified by "Social Web" browser Flock and recently acknowledged by folks at Mozilla, makes you log into the browser, not a web site. All user-sensitive actions(such as "Liking" a page) have to go through the browser, making it inherently more secure.

My Current Solution


dock

At this point, I guess it's best to conclude with what my solution to dealing with all these issues is. My solution is simple: I run Google and Facebook services in their own browsers, separate from my general web surfing. As you can see from the picture of my dock, my GMail and Facebook are separate from my Chrome browser. That way, I appear logged out[2]. Google Search and Facebook Likes when I surf the web or search for things. On a Mac, you can do this using Fluid.app; on Windows you can do this using Mozilla Prism.

And that brings us to the end of this rather long and winded discussion about such a simple "Like" button! Comments are welcome. Until the next post -- Surf safe, and Surf Smart!

 

 

Footnotes:
[1] To my knowledge, there is only one other company that has this level of persistent engagement: Google's GMail remembers logins more aggressively than Facebook. When you're logged into Gmail, you're also logged into Google Search, which means they log your search history as a recognized user. This is usually a good thing for the user, since Google then has a chance to personalize your search. Google actually takes it a step further and personalizes even for non-logged in users.

[2] Yes, they can still get me by my IP, but that's unlikely when I'm usually behind firewalls.

 

Cite this post!:


@article{reputationmisrepresentation,
title={{Reputation Misrepresentation, Trail Paranoia and other side effects of Liking the World}},
author={Nandi, A.},
year={2010},
journal={{Arnab's World}}
}

Visualizations for Navigation : Experiments on my blog

This is a meta post describing two features on this blog that I don’t think I’ve documented before. Apologies for the navel-gazing, I hope there’s enough useful information here to make it worth reading

Most folks read my blog through the RSS feed, but those who peruse the web version get to see many different forms of navigational aids to help the user around the website. Since the blog runs on Drupal , I get to deploy all sorts of fun stuff. One example is the Similar Entries module, that uses MySQL’s FULLTEXT similarity to show possibly related posts1. This allows you to jump around on the website reading posts similar to each other, which is especially useful for readers who come in from a search engine result page. For example, they may come in looking for Magic Bus for the iPhone , but given that they’re probable iPhone users, they may be interested in the amusing DIY iPhone Speakers post.

The Timeline Footer

However, given that this blog has amassed about a thousand posts over seven years now, it becomes hard to expose an “overview” of that much information to the reader in a concise manner. Serendipitous browsing can only go so far. Since this is a personal blog, it is interesting to appreciate the chronological aspect of posts. Many blogs have a “calendar archive” to do this, but somehow I find them unappealing; they occupy too much screen space for the amount of information they deliver. My answer to this is a chronological histogram, which shows the frequency of posts over time:

Each bar represents the number of blog posts I posted that month, starting from August 2002 until now2. Moving your mouse over each bar tells you which month it is. This visualization presents many interesting bits of information. On a personal note, it clearly represents many stages of my life. June of 2005 was a great month for my blog — it had the highest number of posts, possibly related to the fact that I had just moved to Bangalore, a city with and active Blogging community. There are noticeable dips that reflect extended periods of travel and bigger projects.

In the background, this is all done by a simple SELECT COUNT(*) FROM nodes GROUP BY month type query. Some smoothing is applied to the counts due to the high variance, for my usage, Height = Log base 4 (frequency) gave me pretty good results. This goes into a PHP block, which is then displayed at the footer of every blog page. The Drupal PHP snippets section is a great place to start to do things like this. Note that the chart is pure HTML / CSS; there is no Javascript involved3.

The Dot Header

Many of my posts are manually categorized using Drupal’s excellent taxonomy system. A traditional solution to this is to create sections, so that the user can easily browse through all my Poems or my nerdy posts. The problem is that this blog contains notes and links to things that I think are “interesting”, a classification that has constantly evolved as my interests have changed over the past decade. Not only is it hard for me to box myself into a fixed set of categories, maintaining the evolution of these categories across 7+ years is not something I want to deal with every day.

This is where tags and automatic term extraction come in. As you can see in the top footer of the blog mainpage , each dot is a topic, automatically extracted from all posts on the website. I list the top 60 topics in alphabetical order, where each topic is also a valid taxonomy term. The aesthetics are inspired by the RaphaelJS dots demo, but just like the previous visualization, it is done using pure CSS + HTML. The size and color of the dot is based on the number of items that contain that term. Hovering over each dot gives you the label and count for that dot, clicking them takes you to an index of posts with that term. This gives me a concise and maintainable way to tell the user what kinds of things I write about. It also addresses a problem that a lot of my readers have — they either care only about the tech-related posts (click on the biggest purple dot!), or only about the non-tech posts (look for the “poetry” dot in the last row!).

This visualization works by first automatically extracting terms from each post. This is done using the OpenCalais module (I used to previously use Yahoo’s Term Extractor, but switched since it seems Yahoo!‘s extractor is scheduled to be decommissioned soon). The visualization is updated constantly using a cached GROUP BY block similar to the previous visualization, this time grouped on the taxnomy term. This lets me add new posts as often as I like, tags are automatically generated and are reflected in the visualization without me having to do anything.

So that’s it, two simple graphical ways to represent content. I know that the two visualizations aren’t the best thing since sliced bread and probably wont solve World Peace, but it’s an attempt to encourage discoverability of content on the site. Comments are welcome!


Footnotes:

1 I actually created that module (and the CAPTCHA module) over four years ago; they’ve been maintained and overhauled by other good folks since.

2 Arnab’s World is older than that (possibly 1997 — hence the childish name!), but that’s the oldest blog post I could recover.

3 I have nothing against Javascript, it’s just that CSS tends to be easier to manage and usually more responsive. Also, the HTML generated is probably not valid and is SUPER inefficient + ugly. Hopefully I will have time to clean this up sometime in the future.

A Quick Fix for Yahoo branding on Flickr

Techcrunch continued their usual Yahoo-bashing with this story today:

It appears that a few days ago there was a slight change to Flickr’s logo: an addition of a small Yahoo logo to the right side so it reads “Flickr from Yahoo.” In response, many Flickr users have taken to the photo-sharing site’s forums to express their horror at the Yahoo’s new branding of Flickr.

There is definitely some truth to the community backlash, but what I see as more aggravating is a great missed branding opportunity for Yahoo!.

Flickr and Delicious have both been adamant opponents to Yahoo! branding. Even though Yahoo! owns it, the Delicious frontpage doesn’t contain a single mention of Yahoo. Both sites’ communities are predominantly “indie” brand lovers; and don’t want “the man” to infringe their beloved service (even if the man is running it).

What’s crazy is that Yahoo recently launched a $100 million campaign called “Y!ou and Yahoo!”. What’s also interesting is that Flickr actually had a branding that said “Flickr loves you” (in place of Flickr BETA), which reflected Flickr’s personality and branding. People got used to it, and some even thought it was cute.

The last thing you want to do is force a new logo on to the community in an ungraceful manner. Here’s a convenient solution: to morph the “loves you” logo into the “Y!ou and Yahoo!” campaign and do a “flickr loves Y!ou” logo, killing two birds with one stone. The community sees a subtle evolution of the existing logo, and the “Y!ou” campaign is placed on a huge community”.

| |

At the Yahoo! Key Scientific Challenges Graduate Student Summit

I’m at the Yahoo! Graduate Student summit for today and tomorrow. About the event:

On September 3 and 4 the Academic Relations team will host 21 exceptional PhD students at the Key Scientific Challenges Graduate Student Summit. These students are winners of this year’s KSC program, and over the course of the two day summit they will be attending tech talks and workshops, presenting their work, and discussing research trends with top researchers from Yahoo! Labs. These 21 students will also be joined by the program’s past winners and Yahoo! Student Fellows.

Thought I’d share notes:

  • Great spread of grad students in terms of research areas. HCI, Economists, Social Scientists, apart from typical CS people.
  • Presenters for Thursday:

    Welcome & Overview of Yahoo! Labs
    Prabhakar Raghavan, Head, Yahoo! Labs

    Search Technologies Overview
    Andrew Tomkins, Chief Scientist, Yahoo! Search

    Machine Learning & Statistics Research Overview
    Sathiya Keerthi Selvaraj, Senior Research Scientist

    Economics and Social Systems Research Overview
    Elizabeth Churchill, Principal Research Scientist

    Computational Advertising Research Overview
    Andrei Broder, Fellow and VP, Computational Advertising

    Web Information Management Research Overview
    Brian Cooper, Senior Research Scientist

  • Posters for the poster sessions look pretty awesome!
  • |

    Vowpal Wabbit now Open Source Project

    I was writing a longer post about VW a few weeks ago but ran out of time, so I’ll just post the initial few paragraphs for now

    There’s probably a limit to how many times one is allowed to use the word “awesome” in a day — I feel like I’ve hit my quota, but I need to use it just once more before I hit the sack:

    I think it’s awesome that Yahoo! Research lets researchers open source their projects.

    I'm pretty sure John did not make this image

    A few days ago, the amazing John Langford released his fast online learning tool, Vowpal Wabbit to the world as an open source project. Note the word project. That means all further development will happen out in the wild; . A bunch of people have question the origin of the name “Vowpal Wabbit” — “What is this undecipherable mess of vowels and consonants!?,” you ask. “That’s how Elmer Fudd would pronounce Vorpal Rabbit,” John answers. “Vorpal? Whatdoesthatmean?!,” you ask again. Which is where I cite the singular font of human knowledge and quote a few lines from Lewis Carrol’s Jabberwocky:

    He took his vorpal sword in hand (, and later,)
    One, two! One, two! And through and through
    The vorpal blade went snicker-snack!
    He left it dead, and with its head
    He went galumphing back.

    If the back story hasn’t made it clear to you yet, let me paraphrase it for you: This stuff is fast. Wicked fast. Like, voodoo fast. How? That’s best left for another post.

    |

    Yahoo: Just like the old times

    I’m excited to go to work today, knowing that I will be witness, first hand, to one of the more incredible business deals being announced in the valley: Microsoft powering Yahoo Search.

    There’s a lot that I want to say about this, but for now, I will leave you with this image. This is from when Yahoo! used to be powered by Google. (Many people believe that powering Yahoo was what made Google popular with the mainstream audience, and the Google owes who it is today to Yahoo.)

    An excerpt from the Wikipedia:

    In 2002, they bought Inktomi, a “behind the scenes” or OEM search engine provider, whose results are shown on other companies’ websites and powered Yahoo! in its earlier days. In 2003, they purchased Overture Services, Inc., which owned the AlltheWeb and AltaVista search engines.

    AlltheWeb, Altavista, Overture, Inktomi. That’s a lot of heritage.

    |

    BaconSnake: Inlined Python UDFs for Pig

    I was at SIGMOD last week, and had a great time learning about new research, discussing various research problems, meeting up with old friends and making new ones. I don't recall exactly, but at one point I got into a discussion with someone about how I'm probably one of the few people who've actually had the privilege of using three of the major distributed scripting languages in production: Google's Sawzall, Microsoft's SCOPE and Yahoo's Pig. The obvious question then came up -- Which one do I like best? I thought for a bit, and my answer surprised me -- it was SCOPE, for the sole reason that it allowed inline UDFs, i.e. User Defined Functions defined in the same code file as the script.

    I'm not aware if Sawzall allows UDFs, and Pig allows you to link any .jar files and call them from the language. But the Microsoft SCOPE implementation is extremely usable: the SQL forms the framework of your MapReduce chains, while the Mapper, Reducer and Combiner definitions can be written out in C# right under the SQL -- no pre-compiling / including necessary.

    Here's how simple SCOPE is. Note the #CS / #ENDCS codeblock that contains the C#:

    R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHERE StringOccurs(C, “xyz”) > 2 
    
    #CS 
    public static int StringOccurs(string str, string ptrn) {
       int cnt=0; 
       int pos=-1; 
       while (pos+1 < str.Length) {
            pos = str.IndexOf(ptrn, pos+1) ;
            if (pos < 0) break; cnt++; 
       } return cnt;
    }
    #ENDCS
    

    Since I'm working at Yahoo! Research this summer, and I missed this feature so much, I thought -- why not scratch this itch and fix the problem for Pig? Also, while we're at it, maybe we can use a cleaner language than Java to write the UDFs?

    Enter BaconSnake (available here), which lets you write your Pig UDFs in Python! Here's an example:

    -- Script calculates average length of queries at each hour of the day
    
    raw = LOAD 'data/excite-small.log' USING PigStorage('\t')
               AS (user:chararray, time:chararray, query:chararray);
    
    houred = FOREACH raw GENERATE user, baconsnake.ExtractHour(time) as hour, query;
    
    hour_group = GROUP houred BY hour;
    
    hour_frequency = FOREACH hour_group 
                               GENERATE group as hour,
                                        baconsnake.AvgLength($1.query) as count;
    
    DUMP hour_frequency;
    
    -- The excite query log timestamp format is YYMMDDHHMMSS
    -- This function extracts the hour, HH
    def ExtractHour(timestamp):
    	return timestamp[6:8]
    
    -- Returns average length of query in a bag
    def AvgLength(grp):
    	sum = 0
    	for item in grp:
    		if len(item) > 0:
    			sum = sum + len(item[0])	
    	return str(sum / len(grp))
    

    Everything in this file in normal Pig, except the highlighted parts -- they're Python definitions and calls.

    It's pretty simple under the hood actually. BaconSnake creates a wrapper function using the Pig UDFs, that takes python source as input along with the parameter. Jython 2.5 is used to embed the Python runtime into Pig and call the functions.

    Using this is easy, you basically convert the nice-looking "baconsnake" file above ( the .bs file :P ) and run it like so:

    cat scripts/histogram.bs | python scripts/bs2pig.py > scripts/histogram.pig
    java -jar lib/pig-0.3.0-core.jar -x local scripts/histogram.pig
    

    Behind the scenes, the BaconSnake python preprocessor script includes the jython runtime and baconsnake's wrappers and emits valid Pig Latin which can then be run on Hadoop or locally.

    Important Notes: Note that this is PURELY a proof-of-concept written only for entertainment purposes. It is meant only to demonstrate the ease of use of inline functions in a simple scripting language. Only simple String-to-String (Mappers) and DataBag-to-String (Reducers) functions are supported -- you're welcome to extend this to support other datatypes, or even write Algebraic UDFs that will work as Reducers / Combiners. Just drop me a line if you're interested and would like to extend it!

    Go checkout BaconSnake at Google Code!

    Update: My roommate Eytan convinced me to waste another hour of my time and include support for Databags, which are exposed as Python lists. I've updated the relevant text and code.

    Update (Apr 2010): Looks like BaconSnake's spirit is slowly slithering into Pig Core! Also some attention from the Hive parallel universe.

    PrivatePond: Outsourced Management of Web Corpuses

    This paper was presented at WEBDB 2009 at Providence, Rhode Island. The PDF version is available here.

    My colleague from the database research group Dan Fabbri just presented our work, “PrivatePond” at WEBDB 2009. This paper is a clear example of the research environment at Michigan. Dan works on database security, while I work on database search. Given that we sit across each other at the lab, there is always a constant amount of crosstalk. Add in a few brainstorming sessions and a few work-intense weekends, and you have a secure database search paper!

    The core idea of the paper is simple. Everybody uses Google (or Yahoo! or Bing). They’re fast, they’re easy to use, and they’re free. Now let’s say you had some secure information, like your prescription information from your psychiatrist. Obviously you don’t want Google to know about it, because they can do bad, bad things with it. So you encrypt it. But you still want it to be searchable. But you can’t search encrypted data! So what do we do?

    Enter PrivatePond. Basically, we’re encrypting private data just enough that its possible to search with decent ranking, while still keeping it secure.

    We call this the “Secure Indexable Representation”, and we study how increasing the encryption decreases the quality of search, and vice versa.

    Update: We actually have a demo of our system. If you would like to see it, please contact me!

    Here are the slides for the talk:

    |

    The difference between Google and Yahoo!

    Time for some good ol’ flamebait!:

    State-of-the-art lawnmowing technology at Google:

    State-of-the-art lawnmowing technology at Yahoo!:

    As you can clearly see, Yahoo! is cuter.

    | |

    y!Vmail - voice mail for your Yahoo! Mail

    Yesterday Dan, Pradeep and I presented “y!Vmail: voicemail for your Yahoo! Mail” at the Yahoo! University Hack Day Contest, winning the award for the 2nd best Hack! (jump to the demo video )


    Our team with judges Paul Tarjan and Rasmus Lerdorf

    The adventure started when I heard about Yahoo!‘s Hack U event:

    Join Yahoo! web experts including Rasmus Lerdorf, the creator of PHP, for a week of learning, hacking and fun! You’ll hear interesting tech talks, hacking tips and lessons, and get hands-on coding workshops where you’ll work with cutting-edge technology. The week’s events will culminate with our University Hack Day competition—a day-long festival of coding, camaraderie, demos, awards, food, music and jollity (it’s a real word, look it up).

    Years ago when I was in my teens, I was an avid participant on the school / college tech fest circuit. Almost every major institution in and around Delhi would organize annual technical festivals, hosting programming contests and software demo competitions. This was where I got a chance to showcase my creations and meet other hackers. Winning these events became a good way for me to pay off those telephone bills — web development in the dial-up age was an expensive hobby!

    I decided to enter the Hack Day contest just for fun; it had been a while since I participated in one of these. It wasn’t about winning this time; I just wanted to do the whole “idea to execution to demo” thing with a group of friends, and spend hours screaming at each other over STUPID hard-to-find bugs that are actually staring at you in the face, high-fiving every hour as a feature milestone was scratched off the todo-list. The reward: to be able to stand in front of a group of people and say “Hey guys, look what I made!.” (If it’s hard to appreciate what this feels like, this video might help.)


    Yahoo! gave away a bunch of t-shirts, this was on one of them

    3 days before the Hack Day, I had an idea about building a phone-based interface for email. The idea was simple enough to build in a day, but fun enough to make an enjoyable demo. The only problem: I was already in the midst of a “hack” daymonth of my own; VLDB was due 3 hours before the start of the Hack Day, and I was already sacrificing sleep for LaTeX and Python for more than a week. There was no way I was going to be able to do this alone. Enter fellow grad students Dan and Pradeep. I told them about the contest and my idea. While they are both expert hackers, I totally forgot about the fact that people in Operating Systems research don’t really do a lot of Web Programming: “PHP….? I’ve never…” said Dan. I pointed them to the Yahoo Developer Network site and returned to my research paper writing madness. Hopefully by Friday evening, I would have a web-savvy hack team.

    On Friday, I took a quick nap after my paper deadline, and walked over to the Hack Fest area to meet my team (who had become PHP and telephony wizards by now) and load up on caffeine and sugar that the Yahoo! folks had set up for us.


    They even had my favorite candy !

    We split the work into two parts; Dan would build the phone interface while Pradeep and I would figure out the email and contacts API to write an email client backend. 7 hours later, we had the first version of our product up and running. We could call in and read emails. Happy with our progress, we decided that it would be wiser to go home and show up early next day. We ended up wasting a few hours the next morning worrying about the presentation: the lecture hall had spotty cellphone coverage, a deal-killer for a phone demo! Pradeep made a breakthrough here, discovering that an obscure panel on the wall was actually a secret speakerphone. Having resolved demo issues, we resumed coding and plugged in the remaining features: navigating through emails, email summarization, and email prioritization. The friendly timestamps feature (“4 minutes ago”) was stolen from my blog’s code (i.e. the Status header of this blog).

    Around 3:30pm on Saturday, we updated our hackday entry:

    y!Vmail

    by Arnab Nandi, Daniel Peek, Pradeep Padala

    “Not everyone has a computer, but everyone has a phone.”

    This hack allows people to access their Yahoo! mail through a 1-800 number, using ANY touch-tone phone.
    Press 0 to open, * and # to navigate, 7 to delete. We figure out which emails are important, and read them first. We summarize long emails so that you dont have to listen to all of it. If you want to talk to the person, just press 5 — we’ll connect you.

    APIs used: BBAuth, OpenMail, Contacts API, Term Extraction API

    Hack presentations started at 4:00pm on Saturday. I started with a 20-second powerpoint pitch, followed by a rather entertaining demo. Using the lecture hall’s speakerphone we had the lecture hall call our service. Entering the correct PIN logged me in, which resulted in an entire roomful of people were now hearing the words “Welcome to y!Vmail. You have 5 new emails…”


    Me pushing numbers on the phone


    Here’s a short video walk through of our app:

    More details at http://yvmail.info

    A few minutes after the presentation ended, the prizes were announced. We ranked second. The winning hack was Brandon Kwaselow’s “Points of WOE”; a native iPhone app that allowed browsing and creation of placemarks on Yahoo! Maps. Congratulations, Brandon!

    Overall, this was a very exciting and enjoyable event; I had a rocking good time hanging out with the Yahoo! folks and getting a cool project out the door with around 15 hours of work. I end with some lessons, acquired over years of doing demo contests:

    • Be creative, but avoid feature creep.
    • Split up into sub-teams, but make sure you’re pair programming most of the time.
    • Get Version 0 done Super Super Early. Then polish, polish, polish.
    • Reuse (with attribution) as much code as you can.
    • Take lots of breaks, make friends, and have fun.

    Image credits: Rasmus, Erik
    Shout outs: Folks at Twilio for making the coolest telephony API in the universe!