representative

Reputation Misrepresentation, Trail Paranoia and other side effects of Liking the World

trafficspike

A few months ago, I wrote up some quick observations about Facebook’s then just-launched “Like” button, pitching “Newsfeed Spam” as a problem exacerbated by the new Like Buttons. The post went “viral”, so to speak, bouncing off Techmeme, ReadWriteWeb / NYTimes, even German news websites. Obviously this is nothing compared to “real” traffic on the Internet, but it was fun to watch the link spread. This is meant to be a follow-up to that post, based on thoughts I’ve had since.

In this post, I'll be writing about five "issues" with the Like button, followed by four "solutions" to these issues. Since this is a slightly long post, here's an outline:


Big Deal!


facebook stats

The Facebook Like Button has been huge success. With over 3 billion buttons served, and major players such as IMDB and CNN signing up to integrate the button (and other social plugins) into their websites, the chance of encountering a Facebook Like button while browsing on the web is quite high; if not certain. Many folks have questioned whether this is a big deal -- IFRAME and javascript based widgets have been around for a long time (shameless self-plug: Blogsnob used a javascript-based widget to cross polinate blogs across the internet as early as 8 years ago). Using the social concept of showing familiar faces to readers isn't new either; MyBlogLog has been doing it for a while. Then why is this silly little button such an issue? The answer is persistent user engagement. With 500 million users, out of which 50% of them log into Facebook at any given day, you're looking at an audience of 250 million users. If you're logged into Facebook while browsing any website with a social plugin, the logged in session is used. Now if you're like me, you'll probably have "remember me" checked at login, which means you're always logged into Facebook. What this means is that on any given day, Facebook has the opportunity to reach 250 million people throughout their web browsing experience; not just when they're on Facebook.com[1]. So clearly, from a company's perspective, this is important. It is a pretty big deal! But why is this something Facebook users need to be educated about? Onwards to the next section!

Issues with the Like Button


Readers should note the use of the word "Issues", as opposed to "Security vulnerability", "Privacy Leak", "Design Flaw", "Cruel Price of Technology", or "Horrible Transgression Against Humankind". Each issue has its own kind of impact on the user, you're welcome to decide which is which!

Screen shot 2010-07-21 at 1.37.51 AM

To better understand the issues with the Like button, let's understand what the Like button provides:
1) It provides a count of the number of people who currently "Like" something.
2) It provides a list of people you know who have liked said object, with profile pictures.
3) It provides the ability to click the button and instantaneously "Like" something, triggering an update on your newsfeed.
All of this is done using an embedded IFRAME -- a little Facebook page within the main page that displays the button.

In the next few paragraphs, we'll see some implications of this button on the web.

Reputation Misrepresentation


The concept of reputation misrepresentation is quite simple:
a not-so-popular website can use another website's reputation to make the site seem more reputed or established to the user.

Here's a quick diagram to explain it:

reputation misrepresentation

Simply put, as of now, any website(e.g. a web store) can claim they are popular (especially with your friends) to gain your trust. Since Facebook doesn't check referrer information, Facebook really doesn't have the power to do anything about this either. A possible solution is to include verifying information inside the like button, which ruins the simplicity of it all.

Browse Trail Inference


This one is a more paranoid concept, but I've noticed that people don't realize it until I spell it out for them:
Facebook is indirectly collecting your entire browsing history for all websites that have Facebook widgets. You don’t have to click any like buttons, just visiting sites like IMDB.com or CNN.com or BritneySpears.com will enable this.

Here's how it works:

browsetrail

Here, our favorite user Jane is logged into Facebook, and visits 2 pages on IMDB.com, checks the news on CNN, and then heads to Yelp to figure out where to eat. Interestingly enough, Facebook records all this information, and can tie it to her Facebook profile, and can thus come up with inferences like "Jane likes Romantic Movies, International News and Thai Food -- let's show her some ads for romantic getaways to Bali!"

(Even worse, if Jane unwittingly visits a nefarious website which coincidentally happens to have the Like button, Facebook gets to know about that too!)

Most modern browsers send the parent document's URL as HTTP_REFERER information to Facebook via the Like IFRAME, which allows Facebook to implicitly record a fraction of your browsing history. Since this information is much more voluminous than your explicit "Likes"; a lot more information can be data-mined from it; which can then be used for "Good"(i.e. adding value to Facebook) or "Evil"(i.e. Ads! Market data!)

What I like about this is that this is an ingenious system to track user's browsing behavior. Currently, companies like Google, Yahoo and Microsoft(Bing/Live/MSN) have to convince you to install a browser toolbar which has this minuscule clause in its agreement that you share back ALL your browsing history, which can be used to better understand the Web(and make more money, etc. etc.). Since Facebook is getting all websites to install this; it gets the job done without getting you to install a toolbar! I'll be discussing how I deal with this in the last section, "My solution".

Newsfeed Spam


In a previous post, I demonstrated how users could be tricked into "Liking" things they didn't intend to, leading to spam in their friends' newsfeeds. A month later, security firm Sophos reported an example of this, where users were virally tricked into spreading a trojan virus through Facebook Likes, something that could easily be initiated by Like buttons across the web, where you can easily be tricked into liking arbitrary things.

Again, this issue has the same root cause as Reputation Misrepresentation: since all the Like button shows you is a usercount, pictures and the button itself, there really is no way to know what you're liking. A solution to this is to use a bookmarklet in your browser, which is under your control.

"Likejacking"


This interesting demo by Eric Kerr demonstrates how to force unwitting users into clicking arbitrary like buttons. The way this works is by making a transparent like button, and make it move along with the users mouse cursor. Since the user is bound to click on the page at some point of time, they're bound to click the Like button instead.

Like Switching


likeswitch

Like switching is an alternative take on Like Jacking -- the difference is that the user is explicitly shown a like button with a prestigious like count and familiar friends first. When a user reaches out to click on it, the like button is swapped out for a different one, triggered by an onmouseover event from the rectangle around the button.

"Solutions"

Given these issues, let's discuss some solutions, responses and fixes. Note the use of quotes -- for many people can argue that nothing is broken, so we don't need solutions! Regardless, one piece of good news is that the W3C is aware of the extensive use of IFRAMES on the web, and has introduced a new "sandbox" attribute for IFRAMES. This will lead to more fine-grained control of social widgets. For example, if we can then set our browsers to force "sandbox" settings for all Facebook IFRAMES, we can avoid handing over our browsing history to Facebook.


Facebook's approach


While I don't expect companies to rationalize every design decision with their users, I am glad that some Facebook engineers are reaching out via online discussions. Clearly this is not representative of the whole company, but here's a snippet:
Also, in case it wasn't clear, as soon as we identify a domain or url to be bad, it's impossible to reach it via any click on facebook, so even if something becomes bad after people have liked it, we still retroactively protect users.

I like this approach because it fits in well with the rest of the security infrastructure that large companies have: the moment a URL is deemed insecure anywhere on the site, all future users are protected from that website. However, this approach doesn't solve problems with user trust -- it's relying on the fact that Facebook has flagged every evil website in the world before you chanced upon it -- something I wouldn't bet my peace of mind on. It's as if the police told you "We will pursue serial killers only after the first murder!"Would you sleep better knowing that? In essence, this approach is great when you're looking at it from the side of protecting 500 million users. But as one of the 500 million, it kinda leaves you out in the dark!


Secure Likes

As we mentioned in the Reputation Misrepresentation section, another interesting improvement would be to include some indication of the URL that is being "Liked" inside the button itself. An option is to display the URL as a tooltip when the user hovers his/her cursor over the button, especially if it disagrees with the parent frame's URL. Obviously placing the whole URL would make the button large and ugly. A possible compromise is to include the favicon(the icon that shows up for each site in your browser) right inside the Like button. The user can simply check if the browser icon is the same as the one on the like button to make sure it's safe. This way, if a website wants to (mis)use BritneySpears.com's Like Button, it will be forced to use BritneySpears.com's favicon too! Here's a mockup of what "Secure Like" would look like for IMDB:

securelike


A browser-based approach


Screen shot 2010-07-26 at 5.11.57 AM

This approach, best exemplified by "Social Web" browser Flock and recently acknowledged by folks at Mozilla, makes you log into the browser, not a web site. All user-sensitive actions(such as "Liking" a page) have to go through the browser, making it inherently more secure.

My Current Solution


dock

At this point, I guess it's best to conclude with what my solution to dealing with all these issues is. My solution is simple: I run Google and Facebook services in their own browsers, separate from my general web surfing. As you can see from the picture of my dock, my GMail and Facebook are separate from my Chrome browser. That way, I appear logged out[2]. Google Search and Facebook Likes when I surf the web or search for things. On a Mac, you can do this using Fluid.app; on Windows you can do this using Mozilla Prism.

And that brings us to the end of this rather long and winded discussion about such a simple "Like" button! Comments are welcome. Until the next post -- Surf safe, and Surf Smart!

 

 

Footnotes:
[1] To my knowledge, there is only one other company that has this level of persistent engagement: Google's GMail remembers logins more aggressively than Facebook. When you're logged into Gmail, you're also logged into Google Search, which means they log your search history as a recognized user. This is usually a good thing for the user, since Google then has a chance to personalize your search. Google actually takes it a step further and personalizes even for non-logged in users.

[2] Yes, they can still get me by my IP, but that's unlikely when I'm usually behind firewalls.

 

Cite this post!:


@article{reputationmisrepresentation,
title={{Reputation Misrepresentation, Trail Paranoia and other side effects of Liking the World}},
author={Nandi, A.},
year={2010},
journal={{Arnab's World}}
}

Google Search's Speed-based Ranking, Baking and Frying

I am looking for confirmations from other Drupal developers regarding details and corroborations. Comments are welcome here. PHBs need not worry, your Drupal site is just fine.

This post is about an inherent problem with Google’s recently announced “Speed-as-a-ranking-feature” and its problems with content-management systems like Drupal and Wordpress. For an auto-generated website, Google is often the first and only visitor to a lot of pages. Since Drupal spends a lot of time in the first render of the page, Google will likely see this delay. This is both due to a problem with how Drupal generates pages, and Google’s metric.

Google recently announced that as a part of it’s quest to making the web a faster place, it will penalize slow websites in its ranking:

today we’re including a new signal in our search ranking algorithms: site speed. Site speed reflects how quickly a website responds to web requests.

Since Google’s nice enough to provide webmaster tools, I looked up how my site was doing, and got this disappointing set of numbers:

Screen shot 2010-04-11 at 10.35.31 PM

I’m aware 3 seconds is too long. Other Drupal folks have reported ~600ms averages. My current site does under 1s second on average based on my measurements. This is probably because I occasionally have some funky experiments going on in some parts of the site that run expensive queries. Still, some other results were surprising:

Investigating further, it looks like there are 3 problems:

Screen shot 2010-04-11 at 10.49.44 PM

DNS issues & Multiple CSS: Since Google Analytics is on a large number of websites, so I’m expecting their DNS to be prefetched. CSS is not an issue since the 2 files are client media specific(print / screen).

GZip Compression: Now this is very odd. I’m pretty sure I have gzip compression enabled in Drupal (Admin > Performance > Compression). Why is Google reporting lack of compression? To check, I ran some tests, and discovered that since Google usually sees the page before it’s cached, it’s getting a non-gzipped version. This happens due to the way Drupal’s cache behaves, and is fixable. Ordinarily, this is a small problem, since uncached pages are rendered for only the first visitor. But since Google is the first visitor to a majority of the pages in a less popular site, it thinks the entire site is uncompressed. I’ve started a bug report for the uncached page gzip problem.

A flawed metric: The other problem is that Drupal (and Wordpress etc) use a fry model ; pages are generated on the fly per request. On the other hand, Movable Type, etc., bake their pages beforehand, so anything served up doesn’t go through the CMS. Caching in fry-based systems is typically done on the first-render, i.e. the first visit to a page is generated from scratch and written to the database/filesystem, any successive visitor to that page will see a render from the cache.

Since the Googlebot is usually the first (and only) visitor to many pages in a small site, the average crawl would hit a large number of pages where Drupal is writing things to cache for the next visitor. This means every page Googlebot visits costs a write to the database. While afaik Drupal runs page_set_cache after rendering the entire page and hence the user experience is snappy, I’m assuming Google counts time to connection close and not the closing </html> tag, resulting in a bad rendering time evaluation.

This means that Google’s Site Speed is not representative of the average user(i.e. second, third, fourth etc visitors that read from the cache), it only represents the absolute worst case situation for the website, which is hardly a fair metric. (Note that this is based on my speculation of what Site Speed means, based on the existing documentation.)

iPad Keyboard Layout WTF

You blew away 500$ on the craps table at Vegas flirting with the waitress in the low cut dress. I bought an iPad. So let’s call it even and stop judging, shall we?

Since everybody and their mother is writing an iPad review, I’ve decided to do the Internet a favor and not contribute to the hypefest. Instead, let’s talk about an interesting “design bug” in the keyboard layouts.

Due to the touch screen nature of the device, the iPad takes a leaf from the iPhone and implements multiple keyboard layouts, depending on the application context. I’m calling four of these layouts “Email mode”, URL Mode”, “Text Mode” and “Special Character Mode”. Here’s a side-by-side of the first three modes:

This morning, my roommate Meg pointed out an interesting observation in the Text Mode layout. the iPad has an interesting placement of the question mark (”?”) character as a shift-modifier for a comma(”,”). Now, there’s this “?123” button that when pressed, shows you special characters and number keys. But when you press it, the “?” key disappears! Where did it go?

Closer inspection shows that it has moved to the center of the keyboard. This is odd, you’d think: The question mark has always been right next to the shift key since the beginning of time. Further, this bizarre disappearing act when switching modes is unintuitive. Why would someone make such a design decision? Let’s take a look at iPhone’s Special Character Mode for an answer:

Notice how the “?” character on the iPhone is at the center, unlike classic keyboard layouts, where it’s to the left of the shift (which in turn has been replaced by the backspace key). The iPad is clearly trying to maintain consistency with it’s iPhone heritage. However, since it is a very different beast, it also tries to be change things up a bit and borrow from it’s big-boy-computer heritage; ending up with strange design oddities like these. Which begs the question, should a tablet be designed as a larger phone, a smaller computer, a bit of both, or just something completely different?

While this is a fairly minor quirk, it is representative of many oddities in the design of the interface across the board. Despite Apple’s willingness to throw the past out and redesign UIs, the need for consistency with its own family of products often creates ugly contradictions.