Archive - 2009

September 25th

Tony Rosenthal, 1914 -- 2009

Michigan Today has a slideshow tribute to Tony Rosenthal, abstract artist and sculptor, who passed away this July. A Michigan alum, I know of him primarily from his Rosenthal Cubes, a pair of identical 15-foot cubes called Endover and Alamo. Endover is located near on Central Campus in Ann Arbor, while Alamo is located at Astor place in Manhattan, New York.

I think the difference between the two is that the New York cube has a platform and is a little harder to spin (yes, I’ve spun both!). I still find it amazing that the 41-year old sculptures are fully functional despite being exposed to the elements for so long.

September 24th

A Quick Fix for Yahoo branding on Flickr

Techcrunch continued their usual Yahoo-bashing with this story today:

It appears that a few days ago there was a slight change to Flickr’s logo: an addition of a small Yahoo logo to the right side so it reads “Flickr from Yahoo.” In response, many Flickr users have taken to the photo-sharing site’s forums to express their horror at the Yahoo’s new branding of Flickr.

There is definitely some truth to the community backlash, but what I see as more aggravating is a great missed branding opportunity for Yahoo!.

Flickr and Delicious have both been adamant opponents to Yahoo! branding. Even though Yahoo! owns it, the Delicious frontpage doesn’t contain a single mention of Yahoo. Both sites’ communities are predominantly “indie” brand lovers; and don’t want “the man” to infringe their beloved service (even if the man is running it).

What’s crazy is that Yahoo recently launched a $100 million campaign called “Y!ou and Yahoo!”. What’s also interesting is that Flickr actually had a branding that said “Flickr loves you” (in place of Flickr BETA), which reflected Flickr’s personality and branding. People got used to it, and some even thought it was cute.

The last thing you want to do is force a new logo on to the community in an ungraceful manner. Here’s a convenient solution: to morph the “loves you” logo into the “Y!ou and Yahoo!” campaign and do a “flickr loves Y!ou” logo, killing two birds with one stone. The community sees a subtle evolution of the existing logo, and the “Y!ou” campaign is placed on a huge community”.

| |

September 18th

Intuit buys Mint

Finance software giant Intuit is buying personal finance startup Mint.com for $170million. Personally, I wasn’t too happy about this. Jason Fried makes a valid point about this:

Mint was a key leader of the next generation of game changers. And now it’s property of Intuit — the poster-child for the last generation. What a loss. Is that the best the next generation can do? Become part of the old generation? How about kicking the shit out of the old guys? What ever happened to that?

First thing I did when I heard about the deal? Delete my Mint.com account.

| |

September 15th

Microsoft Style

|

September 3rd

At the Yahoo! Key Scientific Challenges Graduate Student Summit

I’m at the Yahoo! Graduate Student summit for today and tomorrow. About the event:

On September 3 and 4 the Academic Relations team will host 21 exceptional PhD students at the Key Scientific Challenges Graduate Student Summit. These students are winners of this year’s KSC program, and over the course of the two day summit they will be attending tech talks and workshops, presenting their work, and discussing research trends with top researchers from Yahoo! Labs. These 21 students will also be joined by the program’s past winners and Yahoo! Student Fellows.

Thought I’d share notes:

  • Great spread of grad students in terms of research areas. HCI, Economists, Social Scientists, apart from typical CS people.
  • Presenters for Thursday:

    Welcome & Overview of Yahoo! Labs
    Prabhakar Raghavan, Head, Yahoo! Labs

    Search Technologies Overview
    Andrew Tomkins, Chief Scientist, Yahoo! Search

    Machine Learning & Statistics Research Overview
    Sathiya Keerthi Selvaraj, Senior Research Scientist

    Economics and Social Systems Research Overview
    Elizabeth Churchill, Principal Research Scientist

    Computational Advertising Research Overview
    Andrei Broder, Fellow and VP, Computational Advertising

    Web Information Management Research Overview
    Brian Cooper, Senior Research Scientist

  • Posters for the poster sessions look pretty awesome!
  • |

    Vowpal Wabbit now Open Source Project

    I was writing a longer post about VW a few weeks ago but ran out of time, so I’ll just post the initial few paragraphs for now

    There’s probably a limit to how many times one is allowed to use the word “awesome” in a day — I feel like I’ve hit my quota, but I need to use it just once more before I hit the sack:

    I think it’s awesome that Yahoo! Research lets researchers open source their projects.

    I'm pretty sure John did not make this image

    A few days ago, the amazing John Langford released his fast online learning tool, Vowpal Wabbit to the world as an open source project. Note the word project. That means all further development will happen out in the wild; . A bunch of people have question the origin of the name “Vowpal Wabbit” — “What is this undecipherable mess of vowels and consonants!?,” you ask. “That’s how Elmer Fudd would pronounce Vorpal Rabbit,” John answers. “Vorpal? Whatdoesthatmean?!,” you ask again. Which is where I cite the singular font of human knowledge and quote a few lines from Lewis Carrol’s Jabberwocky:

    He took his vorpal sword in hand (, and later,)
    One, two! One, two! And through and through
    The vorpal blade went snicker-snack!
    He left it dead, and with its head
    He went galumphing back.

    If the back story hasn’t made it clear to you yet, let me paraphrase it for you: This stuff is fast. Wicked fast. Like, voodoo fast. How? That’s best left for another post.

    |

    August 25th

    HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching

    Just got done with the HAMSTER presentation; here is the paper, and here are my abstract and slides:

    We address the problem of unsupervised matching of schema information from a large number of data sources into the schema of a data warehouse. The matching process is the first step of a framework to integrate data feeds from third-party data providers into a structured-search engine’s data warehouse. Our experiments show that traditional schema- based and instance-based schema matching methods fall short. We propose a new technique based on the search engine’s clicklogs. Two schema elements are matched if the distribution of keyword queries that cause click-throughs on their instances are similar. We present experiments on large commercial datasets that show the new technique has much better accuracy than traditional techniques.

    I received a few questions after the talk, hence I thought I’d put up a quick FAQ:

    Q: Doesn’t the time(period) of the clicklog affect your integration quality?

    A: Yes. And we consider this a good thing. This allows trend information to come into the system, e.g. “pokemon” queries will start coming in, and merge “japanese toys” with “children’s collector items”. Unpopular items that are not searched for may not generate a mapping, but then again, this may be ok since the end goal was to integrate searched-for items.

    Q: You use clicklogs. I am a little old company/website owner X. Since my company’s name doesn’t start with G, M or Y, I don’t have clicklogs. How do I use your method?

    A: You already have clicklogs. Let’s say you are trying to merge your company/website X’s data with company Y’s data. Since both you (X) and Y have websites, you both run HTTP servers, which have the facility to log requests. Look through your HTTP server referral logs for strings like:
    URL: http://x.com
    REFERRER: http://www.google.com/?q=$search_string$

    This is your clicklog. The url http://x.com has the query $search_string$. You can grep both websites to create clicklogs, which can then be used to integration.

    Q: My website is not very popular and I don’t have that many clicks from search engines. What do I do?

    A: Yup, this is a very real case. Specifically, you might have a lot of queries for some of your items, but not for others. This can be balanced out. See the section in our paper about Surrogate Clicklogs. Basically you can use a popular website’s clicklog as a “surrogate” log for your database. From the paper:

    …we propose a method by which we identify surrogate clicklogs for any data source without significant web presence. For each candidate entity in the feed that does not have a significant presence in the clicklogs (i.e. clicklog volume is less than a threshold), we look for an entity in our collection of feeds that is most similar to the candidate, and use its clicklog data to generate a query distribution for the candidate object.

    Q: I am an academic and do not have access to a public clicklog, or a public website to do get clicklogs from. How do I use this technique?

    A: Participate in the Lemur project and get your friends to participate too.

    |

    August 22nd

    Upcoming VLDB Trip : Lyon, France

    I’m looking forward to my talk at VLDB 2009 in Lyon, France. I will be presenting HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching”, which is joint work I did with Phil Bernstein during my internship at Microsoft Research. The talk is scheduled for Tuesday 25, 2009 at 2pm in the Rhône 2 room at the conference venue.

    Also look out for my labmate Bin Liu ‘s paper with our advisor, “Using Trees to Depict a Forest”.

    |

    August 20th

    "My pledges as a reviewer"

    CUHK Professor Yufei Tao’s homepage has this interesting tidbit:

    My pledges as a reviewer:

    • I will treat your work with respect.
    • I will spend enough time with your paper. I will not make any decision without a good understanding.
    • In case I decide to recommend rejection, I will do so on solid grounds. I do not reject papers based on subjective and vacuous statements such as “I don’t like this idea”.
    • I will write reviews in a courteous manner. I have seen harsh reviews by other people which heavily mention my publications, and thus make people feel I was the reviewer. I will never do anything like this.
    |

    August 3rd

    Brim

    Standing by, watching sighs
    Escape from passersby
    Feelings collect, rise up, and in a while
    reflect, give up, and run dry.

    One day the brim will mean something.
    Till then, we’ll survive.

    |