webmaster

Deceiving Users with the Facebook Like Button

Update: I've written a followup to this post, which you may also find interesting.

Facebook just launched a super-easy widget called "The Facebook Like Button". Website owners can add a simple iframe snippet to their html, enabling a nice "Like" button with a count of other people's "Likes" and a list of faces of people if any of them are your friends. The advantage of this new tool is that you don't need any fancy coding. Just fill up a simple wizard , and paste the embed code in, just like you do with Youtube, etc.

However, this simplicity has a cost: Users can be tricked into "Like"ing pages they're not at.

For example, try pressing this "Like" button below:




This is what happened to my Facebook feed when I pressed it:

Screen shot 2010-04-21 at 10.45.01 PM

I used BritneySpears.com as an example here to be work/family-safe; you're free to come up with examples of other sites you wouldn't want on your Facebook profile! :)

Important note: Removing the feed item from your newsfeed does not remove your like -- it stays in your profile. You have to click the button again to remove the "Like" relationship.

This works because the iframe lets me set up any URL I want. Due to the crossdomain browser security, the "Like Button" iframe really has no way to communicate with the website it's a part of. Facebook "Connect" system solved this using a crossdomain proxy, which requires uploading a file, etc. The new button trades off this security for convenience.

An argument in Facebook's favor is that no self-respecting webmaster would want to deceive the visitor! This is true, the motivation to deceive isn't very strong, but if I am an enterprising spammer, I can set up content farms posing as humble websites and use those "Like" buttons to sell, say Teeth Whitening formulas to my visitor's friends. Or, if I'm a warez / pirated movie site, I'm going to trick you with overlays, opacities and other spam tricks and sell your click on an "innocent" movie review page to a porn site, similar to what is done with Captchas. I'm going to call this new form of spam Newsfeed Spam.

This is scary because any victim to this is immediately going to become wary of using social networking buttons after the event; and will even stay away from a "Share on Twitter" button because "bad things have happened in the past with these newfangled things"!

I don't have a good solution to this problem; this sort of spam would be hard to detect or enforce since Facebook doesn't see the parent page.

• One weak solution is to use the iframe's HTTP_REFERER to prohibit crossdomain Likes. I'm not sure how reliable this is; it depends on the browser's security policies.

• Yet another solution is to provide the user with information about the target of the Like. e.g. it can be:

  • Shown in the initial text, i.e. "and 2,025 others like this" now becomes "and 2,025 others like "Britney Spears"..." The downside to this is that it can't be shown in the compact form of the button.
  • Shown upon clicking, i.e. "You just liked BritneySpears.com"
  • (my favorite) Shown on mouseover the button expands to show the domain, "Click to Like britneyspears.com/...."

This problem is an interesting mix of privacy and usability; would love to see a good solution!

Update: I've written a followup to this post, which you may also find interesting.

Google Search's Speed-based Ranking, Baking and Frying

I am looking for confirmations from other Drupal developers regarding details and corroborations. Comments are welcome here. PHBs need not worry, your Drupal site is just fine.

This post is about an inherent problem with Google’s recently announced “Speed-as-a-ranking-feature” and its problems with content-management systems like Drupal and Wordpress. For an auto-generated website, Google is often the first and only visitor to a lot of pages. Since Drupal spends a lot of time in the first render of the page, Google will likely see this delay. This is both due to a problem with how Drupal generates pages, and Google’s metric.

Google recently announced that as a part of it’s quest to making the web a faster place, it will penalize slow websites in its ranking:

today we’re including a new signal in our search ranking algorithms: site speed. Site speed reflects how quickly a website responds to web requests.

Since Google’s nice enough to provide webmaster tools, I looked up how my site was doing, and got this disappointing set of numbers:

Screen shot 2010-04-11 at 10.35.31 PM

I’m aware 3 seconds is too long. Other Drupal folks have reported ~600ms averages. My current site does under 1s second on average based on my measurements. This is probably because I occasionally have some funky experiments going on in some parts of the site that run expensive queries. Still, some other results were surprising:

Investigating further, it looks like there are 3 problems:

Screen shot 2010-04-11 at 10.49.44 PM

DNS issues & Multiple CSS: Since Google Analytics is on a large number of websites, so I’m expecting their DNS to be prefetched. CSS is not an issue since the 2 files are client media specific(print / screen).

GZip Compression: Now this is very odd. I’m pretty sure I have gzip compression enabled in Drupal (Admin > Performance > Compression). Why is Google reporting lack of compression? To check, I ran some tests, and discovered that since Google usually sees the page before it’s cached, it’s getting a non-gzipped version. This happens due to the way Drupal’s cache behaves, and is fixable. Ordinarily, this is a small problem, since uncached pages are rendered for only the first visitor. But since Google is the first visitor to a majority of the pages in a less popular site, it thinks the entire site is uncompressed. I’ve started a bug report for the uncached page gzip problem.

A flawed metric: The other problem is that Drupal (and Wordpress etc) use a fry model ; pages are generated on the fly per request. On the other hand, Movable Type, etc., bake their pages beforehand, so anything served up doesn’t go through the CMS. Caching in fry-based systems is typically done on the first-render, i.e. the first visit to a page is generated from scratch and written to the database/filesystem, any successive visitor to that page will see a render from the cache.

Since the Googlebot is usually the first (and only) visitor to many pages in a small site, the average crawl would hit a large number of pages where Drupal is writing things to cache for the next visitor. This means every page Googlebot visits costs a write to the database. While afaik Drupal runs page_set_cache after rendering the entire page and hence the user experience is snappy, I’m assuming Google counts time to connection close and not the closing </html> tag, resulting in a bad rendering time evaluation.

This means that Google’s Site Speed is not representative of the average user(i.e. second, third, fourth etc visitors that read from the cache), it only represents the absolute worst case situation for the website, which is hardly a fair metric. (Note that this is based on my speculation of what Site Speed means, based on the existing documentation.)