Google

Getting middle-click to work for tab browsing in Mac OS X

If you’re a recent convert to Mac OS X (Tiger / Leopard / Snow Leopard / etc) or someone who uses multiple operating systems at the same time, the differences in mouse and keyword shortcuts get confusing, even irritating sometimes. One of the most irritating ones for me is the difference in what happens when you middle-click on the mouse.

In Windows / Linux, middle clicking in browsers is used to open and close tabs. In OS X, this doesn’t work because middle click is used to trigger the Dashboard. Every time I would want to open or close a tab, the dashboard would show up! To disable this, all you have to do is go to System Preferences > Exposé & Spaces and set the mouse shortcut to “-”.

For the newbies, here’s a screenshot guide. First select “System Preferences”:

Then click on the “Exposé & Spaces” button:

Set the “Dashboard” mouse shortcut to “—” :

So that it looks like this:

And that’s it! You will now be able to middle click to open and close tabs in Mozilla Firefox and Google Chrome. For Safari, you can open tabs, but closing tabs don’t work.

And there you have it, middle click tabs on Mac OS X!

Street View Fun

Here’s a fun video:

Ever wonder what it’s like for the dudes who have to drive those Google camera cars around? I think it’s a little something like this…

 
On the other side of the Atlantic, to promote the UK band Editors‘s new album, Sony has produce a new streetview hack, where you have to browse around London to discover hidden things :

This is how it works: a cleverly hacked version of Google Street View allows users to preview tracks from the album in the areas of London that inspired them. As well as being able to move around as you would in the normal Google Street View, there are red arrows to find in nine different London locations (one for each track of the album) that each point to a location off the road – click it to find custom panoramic photographs of the band, shot at night by photographer James Royall.

Here’s a preview video of the game:

|

Barack Obama Nobel Prize Sound Bites

From the Wikipedia :

On October 9, 2009, U.S. President Barack Obama was awarded the Nobel Peace Prize less than one year after his taking office (in fact, the nominations closed on February 1, about 11 days after Obama took office). While the committee praised his ambitious foreign policy agenda, it acknowledged that he had not yet actually achieved many of the goals that he had set out to accomplish. Former Polish President Lech Wałęsa, a 1983 Nobel Peace laureate, commented: “So soon? Too early. He has no contribution so far. He is still at an early stage. He is only beginning to act.”

This is pretty amazing news. My Facebook, News and IM streams are flooded with one-liners. I though I’d collect them all:

  • “I too would like a Nobel Peace Prize for the thesis I am about to write in the future.” — me
  • “it’s a pretty swell booby prize for losing out on the Olympics” – n.d.
  • “Surely preventing Sarah Palin from taking over the free world deserves a prize… even if it is a Nobel?” — v.b.
  • ““NASA bombs moon”; “Obama wins Nobel Prize” — is today Onion News Day?” — me
  • “Barack Obama linked to terrorist Yasser Arafat” — fark via a.a.
  • “The Nobel? Really? I mean, cool…but it seems like we have our cart on the wrong side of the horse. Not that it isn’t a very nice cart.” — c.m.
  • “...thinks they might as well have given him the Nobel Prize for Literature, Chemistry (we’ve all seen the shirtless photos), Physics and Economics as well. Oh and made him a Knight Commander of the Order of the Bath” — r.d.
  • “Nobel Committee Rewards Obama For Not Being Bush” — f.n.
  • “I just want to point out that the Nobel Committee made its decision BEFORE Miley Cyrus quit Twitter.” — j.h.
  • “Obama will win a second Nobel next year if he can restrain himself from reacting to the snark generated by this one.” — m.w.
  • “Pretty sure Obama will just trade in his Nobel for a Google Wave invite.” — t.b.
  • “The news of Obama’s Nobel Peace Prize spreads. Across the miles I can almost HEAR my dad’s eyes rolling.” — p.g.
  • “Obama wins Nobel Peace Prize? About time Rakhi Sawant wins an Oscar, then.” — s
  • “If you don’t think Obama deserves that Nobel, then you’ve never seen Sasha and Malia fight.” — a.e.
  • “Apparently Arizona State has a higher standard than the Nobel Committee. Good thing I never tried to apply there.” — r.m.

Yahoo: Just like the old times

I’m excited to go to work today, knowing that I will be witness, first hand, to one of the more incredible business deals being announced in the valley: Microsoft powering Yahoo Search.

There’s a lot that I want to say about this, but for now, I will leave you with this image. This is from when Yahoo! used to be powered by Google. (Many people believe that powering Yahoo was what made Google popular with the mainstream audience, and the Google owes who it is today to Yahoo.)

An excerpt from the Wikipedia:

In 2002, they bought Inktomi, a “behind the scenes” or OEM search engine provider, whose results are shown on other companies’ websites and powered Yahoo! in its earlier days. In 2003, they purchased Overture Services, Inc., which owned the AlltheWeb and AltaVista search engines.

AlltheWeb, Altavista, Overture, Inktomi. That’s a lot of heritage.

|

BaconSnake: Inlined Python UDFs for Pig

I was at SIGMOD last week, and had a great time learning about new research, discussing various research problems, meeting up with old friends and making new ones. I don't recall exactly, but at one point I got into a discussion with someone about how I'm probably one of the few people who've actually had the privilege of using three of the major distributed scripting languages in production: Google's Sawzall, Microsoft's SCOPE and Yahoo's Pig. The obvious question then came up -- Which one do I like best? I thought for a bit, and my answer surprised me -- it was SCOPE, for the sole reason that it allowed inline UDFs, i.e. User Defined Functions defined in the same code file as the script.

I'm not aware if Sawzall allows UDFs, and Pig allows you to link any .jar files and call them from the language. But the Microsoft SCOPE implementation is extremely usable: the SQL forms the framework of your MapReduce chains, while the Mapper, Reducer and Combiner definitions can be written out in C# right under the SQL -- no pre-compiling / including necessary.

Here's how simple SCOPE is. Note the #CS / #ENDCS codeblock that contains the C#:

R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHERE StringOccurs(C, “xyz”) > 2 

#CS 
public static int StringOccurs(string str, string ptrn) {
   int cnt=0; 
   int pos=-1; 
   while (pos+1 < str.Length) {
        pos = str.IndexOf(ptrn, pos+1) ;
        if (pos < 0) break; cnt++; 
   } return cnt;
}
#ENDCS

Since I'm working at Yahoo! Research this summer, and I missed this feature so much, I thought -- why not scratch this itch and fix the problem for Pig? Also, while we're at it, maybe we can use a cleaner language than Java to write the UDFs?

Enter BaconSnake (available here), which lets you write your Pig UDFs in Python! Here's an example:

-- Script calculates average length of queries at each hour of the day

raw = LOAD 'data/excite-small.log' USING PigStorage('\t')
           AS (user:chararray, time:chararray, query:chararray);

houred = FOREACH raw GENERATE user, baconsnake.ExtractHour(time) as hour, query;

hour_group = GROUP houred BY hour;

hour_frequency = FOREACH hour_group 
                           GENERATE group as hour,
                                    baconsnake.AvgLength($1.query) as count;

DUMP hour_frequency;

-- The excite query log timestamp format is YYMMDDHHMMSS
-- This function extracts the hour, HH
def ExtractHour(timestamp):
	return timestamp[6:8]

-- Returns average length of query in a bag
def AvgLength(grp):
	sum = 0
	for item in grp:
		if len(item) > 0:
			sum = sum + len(item[0])	
	return str(sum / len(grp))

Everything in this file in normal Pig, except the highlighted parts -- they're Python definitions and calls.

It's pretty simple under the hood actually. BaconSnake creates a wrapper function using the Pig UDFs, that takes python source as input along with the parameter. Jython 2.5 is used to embed the Python runtime into Pig and call the functions.

Using this is easy, you basically convert the nice-looking "baconsnake" file above ( the .bs file :P ) and run it like so:

cat scripts/histogram.bs | python scripts/bs2pig.py > scripts/histogram.pig
java -jar lib/pig-0.3.0-core.jar -x local scripts/histogram.pig

Behind the scenes, the BaconSnake python preprocessor script includes the jython runtime and baconsnake's wrappers and emits valid Pig Latin which can then be run on Hadoop or locally.

Important Notes: Note that this is PURELY a proof-of-concept written only for entertainment purposes. It is meant only to demonstrate the ease of use of inline functions in a simple scripting language. Only simple String-to-String (Mappers) and DataBag-to-String (Reducers) functions are supported -- you're welcome to extend this to support other datatypes, or even write Algebraic UDFs that will work as Reducers / Combiners. Just drop me a line if you're interested and would like to extend it!

Go checkout BaconSnake at Google Code!

Update: My roommate Eytan convinced me to waste another hour of my time and include support for Databags, which are exposed as Python lists. I've updated the relevant text and code.

PrivatePond: Outsourced Management of Web Corpuses

This paper was presented at WEBDB 2009 at Providence, Rhode Island. The PDF version is available here.

My colleague from the database research group Dan Fabbri just presented our work, “PrivatePond” at WEBDB 2009. This paper is a clear example of the research environment at Michigan. Dan works on database security, while I work on database search. Given that we sit across each other at the lab, there is always a constant amount of crosstalk. Add in a few brainstorming sessions and a few work-intense weekends, and you have a secure database search paper!

The core idea of the paper is simple. Everybody uses Google (or Yahoo! or Bing). They’re fast, they’re easy to use, and they’re free. Now let’s say you had some secure information, like your prescription information from your psychiatrist. Obviously you don’t want Google to know about it, because they can do bad, bad things with it. So you encrypt it. But you still want it to be searchable. But you can’t search encrypted data! So what do we do?

Enter PrivatePond. Basically, we’re encrypting private data just enough that its possible to search with decent ranking, while still keeping it secure.

We call this the “Secure Indexable Representation”, and we study how increasing the encryption decreases the quality of search, and vice versa.

Update: We actually have a demo of our system. If you would like to see it, please contact me!

Here are the slides for the talk:

|

The difference between Google and Yahoo!

Time for some good ol’ flamebait!:

State-of-the-art lawnmowing technology at Google:

State-of-the-art lawnmowing technology at Yahoo!:

As you can clearly see, Yahoo! is cuter.

| |

Getting django-auth-openid to work with Google Accounts

update: This blog post is meant for older versions of django-authopenid. The latest version available at pypi has implemented a fix similar to this one, and hence works out of the box, you wont need this fix.
Thanks to Mike Huynh for pointing this out!

I've been playing with Django over the past few days, and it's been an interesting ride. For a person who really likes PHP's shared- nothing, file-based system model (I'm mostly a drupal guy), Django comes across as overengineered at first, but I'm beginning to see why it's done that way.

I was trying to get single-signon working, and settled on django-authopenid over the other django openid libraries, django-openid, django-openid-auth and django-oauth. It was easy to use and understand, and wasn't seven million lines of code.

My intention was to use the OpenID extension to get the user's email address during the sign on process. However, it doesn't seem to work with Google's OpenID implementation, because Google uses the an Attribute Exchange (ax) extension instead of the Simple Registration (sreg) OpenID extension that is implemented in the library. A quick hack to django-authopenid's views.py makes it work:


51c51
- from openid.extensions import sreg
---
+ from openid.extensions import ax
94c82
- sreg_request=None):
---
+ ext_request=None):
113,114c101,102
- if sreg_request:
- auth_request.addExtension(sreg_request)
---
+ if ext_request:
+ auth_request.addExtension(ext_request)
195,210c172,185
- sreg_req = sreg.SRegRequest(optional=['nickname', 'email'])
- redirect_to = "%s%s?%s" % (
- get_url_host(request),
- reverse('user_complete_signin'),
- urllib.urlencode({'next':next})
- )
-
- return ask_openid(request,
- form_signin.cleaned_data['openid_url'],
- redirect_to,
- on_failure=signin_failure,
- sreg_request=sreg_req)
---
+ ax_req = ax.FetchRequest()
+ ax_req.add(ax.AttrInfo('http://schema.openid.net/contact/email', alias='email',required=True))
+ redirect_to = "%s%s?%s" % (
+ get_url_host(request),
+ reverse('user_complete_signin'),
+ urllib.urlencode({'next':next})
+ )
+
+ return ask_openid(request,
+ form_signin.cleaned_data['openid_url'],
+ redirect_to,
+ on_failure=signin_failure,
+ ext_request=ax_req)

Obviously this is a very cursory edit. I'm too lazy to improve and submit this as a patch, so readers are encouraged to submit it to all relevant projects!

|

My Research Papers, now more accessible

Many readers have complained that this blog is always full of artsy and time-wasting material… “what about all the technical stuff? Aren’t you a computer person?!” they ask. To pacify these masses, I have just converted three of my recent papers to HTML format. For the first two, I used the HEVEA LaTeX to HTML converter, which I found slightly better than LaTeX2HTML. For the 3rd paper, I have inexplicably misplaced the source files, and hence the HTMLization was done via Gmail’s PDF Viewer.

The picture above is from the 2007 SIGMOD demo paper. I’ll post videos of the demo in a later post. Here’s a quick preview of each paper:

  • Qunits: queried units in database search CIDR, 2009
    Keyword search against structured databases has become a popular topic of investigation, since many users find structured queries too hard to express, and enjoy the freedom of a “Google-like” query box into which search terms can be entered. Attempts to address this problem face a fundamental dilemma. Database querying is based on the logic of predicate evaluation, with a precisely defined answer set for a given query. On the other hand, in an information retrieval approach, ranked query results have long been accepted as far superior to results based on boolean query evaluation. As a consequence, when keyword queries are attempted against databases, relatively ad-hoc ranking mechanisms are invented (if ranking is used at all), and there is little leverage from the large body of IR literature regarding how to rank query results.
  • Effective Phrase Prediction VLDB, 2007
    Autocompletion is a widely deployed facility in systems that require user input. Having the system complete a partially typed “word” can save user time and effort. In this paper, we study the problem of autocompletion not just at the level of a single “word”, but at the level of a multi-word “phrase”. There are two main challenges: one is that the number of phrases (both the number possible and the number actually observed in a corpus) is combinatorially larger than the
    number of words; the second is that a “phrase”, unlike a “word”, does not have a well-defined boundary, so that the autocompletion system has to decide not just what to predict, but also how far. We introduce a FussyTree structure to address the first challenge and the concept of a significant hrase to address the second. We develop a probabilistically driven multiple completion choice model, and exploit features such as frequency distributions to improve the quality of our suffix completions. We experimentally demonstrate the practicability and value of our technique for an email composition application and show that we can save approximately a fifth of the keystrokes typed
  • Assisted querying using instant-response interfaces SIGMOD 2007
    We demonstrate a novel query interface that enables users to construct a rich search query without any prior knowledge of the underlying schema or data. The interface, which is in the form of a single text input box, interacts in real-time with the users as they type, guiding them through the query construction. We discuss the issues of schema and data complexity, result size estimation, and query validity; and provide novel approaches to solving these problems. We demonstrate our query interface on two popular applications; an enterprise-wide personnel search, and a biological information database.
|

Math's Kool with Tyler Neylon

Googler Tyler Neylon just launched Mathskool.com , a website for teaching kids math. In his own words :

This is a website I’ve been working on for the past month, meant to help connect great math teachers with motivated middle and high school students. The idea is to provide a centralized library that many math teachers can contribute to, and which gives students free access to short, focused videos.

This is exactly what the internet needs. Great job, Tyler! My first interaction with Tyler was during my Google internship phone interviews, where we were trying to figure out if I could work with his team.

Here’s a tutorial on distributive properties and combining expressions. Things I love about this movie is that it starts with “Hey how’s it going.”, and then “I go to Target and buy six copies of Mario Kart…. I like Mario Kart.” Kids, this is a guy who has a PhD in math… you should watch his videos!

| |