Archive - Aug 2009

Date
  • All
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

August 25th

HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching

Just got done with the HAMSTER presentation; here is the paper, and here are my slides:

I received a few questions after the talk, hence I thought I’d put up a quick FAQ:

Q: You use clicklogs. I am a little old company/website owner X. Since my company’s name doesn’t start with G, M or Y, I don’t have clicklogs. How do I use your method?

A: You already have clicklogs. Let’s say you are trying to merge your company/website X’s data with company Y’s data. Since both you (X) and Y have websites, you both run HTTP servers, which have the facility to log requests. Look through your HTTP server referral logs for strings like:
URL: http://x.com
REFERRER: http://www.google.com/?q=$search_string$

This is your clicklog. The url http://x.com has the query $search_string$. You can grep both websites to create clicklogs, which can then be used to integration.

Q: My website is not very popular and I don’t have that many clicks from search engines. What do I do?

A: Yup, this is a very real case. Specifically, you might have a lot of queries for some of your items, but not for others. This can be balanced out. See the section in our paper about Surrogate Clicklogs. Basically you can use a popular website’s clicklog as a “surrogate” log for your database.

Q: Doesn’t the time(period) of the clicklog affect your integration quality?

A: Yes. And we consider this a good thing. This allows trend information to come into the system, e.g. “pokemon” queries will start coming in, and merge “japanese toys” with “children’s collector items”. Unpopular items that are not searched for may not generate a mapping, but then again, this may be ok since the end goal was to integrate searched-for items.

Q: I am an academic and do not have access to a public clicklog, or a public website to do get clicklogs from. How do I use this technique?

A: Participate in the Lemur project and get your friends to participate too.

August 22nd

Upcoming VLDB Trip : Lyon, France

I’m looking forward to my talk at VLDB 2009 in Lyon, France. I will be presenting “HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching”, which is joint work I did with Phil Bernstein during my internship at Microsoft Research. The talk is scheduled for Tuesday 25, 2009 at 2pm in the Rhône 2 room at the conference venue.

Also look out for my labmate Bin Liu ‘s paper with our advisor, “Using Trees to Depict a Forest”.

|

August 20th

"My pledges as a reviewer"

CUHK Professor Yufei Tao’s homepage has this interesting tidbit:

My pledges as a reviewer:

  • I will treat your work with respect.
  • I will spend enough time with your paper. I will not make any decision without a good understanding.
  • In case I decide to recommend rejection, I will do so on solid grounds. I do not reject papers based on subjective and vacuous statements such as “I don’t like this idea”.
  • I will write reviews in a courteous manner. I have seen harsh reviews by other people which heavily mention my publications, and thus make people feel I was the reviewer. I will never do anything like this.
|

August 3rd

Brim

Standing by, watching sighs
Escape from passersby
Feelings collect, rise up, and in a while
reflect, give up, and run dry.

One day the brim will mean something.
Till then, we’ll survive.

|