Human Interest

Friend-based throttling in Facebook News Feeds

This dialog in my Facebook feed options seemed interesting:

Screen shot 2010-08-13 at 4.16.45 AM

Notice how it asks me how many friends I want my Live Feed from. It seems the default is 250 friends. What this means is that when you click “Recent Posts”, you’re getting recent posts from only your top 250 friends; all other friends are being ignored.

Obviously this is a problem only if you have more than 250 friends. I’ve heard the average is 150, but I’m sure there are a lot of people who are affected by this. This option caught my eye for two reasons:

From a technical perspective, news feeds are massive publish-subscribe systems. You subscribe to your friends’ posts, which when posted, are published to your feed. The 250 friend limit sets up a convenient soft limit for the system, reducing the stress on Facebook’s servers. Twitter doesn’t have such limits, and I can imagine this is one reason why its servers get overloaded. It’s a smart design from this perspective, but I wish Facebook was more transparent about the limit!

From a social perspective, I think this is a very primitive way to throttle friends. My understanding of the Feed was that my “Top Posts” ranked recent posts so that I had a high-level view of my feed, and “Recent Posts” gave me access to everything. It seems this belief is incorrect. When I increased this number to 1000(i.e. include ALL my friends), I suddenly started seeing updates from many friends I had totally forgotten about / lost touch with. Since I don’t see updates from them, I don’t interact with them on Facebook, leading to a self-reinforcing “poor get poorer” effect. I am assuming there’s some “Friendness” ranking going on here. This way, friends in my bottom 50 will never make it to my top 250 friends on Facebook. The use of a self-reinforcing ranking function is risky; especially when the stability of the ranking depends on human input. I wonder if the Feed team has done anything smart to introduce “compensators” based on interactions with bottom 50-friends, similar to the random reset in PageRank. The issue here is that unlike hyperlink edges, we’re dealing with a vocabulary of “Likes” and other social cues which are not well understood. It seems like this can be an excellent subject for a machine learning / information retrieval paper or two.

update: Horseman of the Interwebs Hung Truong points out Dunbar’s Number:

Dunbar’s number is a theoretical cognitive limit to the number of people with whom one can maintain stable social relationships. These are relationships in which an individual knows who each person is, and how each person relates to every other person. Proponents assert that numbers larger than this generally require more restrictive rules, laws, and enforced norms to maintain a stable, cohesive group. No precise value has been proposed for Dunbar’s number. It lies between 100 and 230, but a commonly detected value is 150.

This puts Facebook’s default threshold at a great place. However, Dunbar’s numbers are meant for offline relationships, i.e. the Dunbar number for ephemeral, online “feed” style relationship could arguably be much higher. It appears Dunbar has been working on this , I’m looking forward to a publication from his group soon.

"Move Fast, Break Trust?"

This week’s blog post is written by fellow PhD Candidate Nicholas Gorski, who came across yet another bug in Facebook’s privacy during the latest rollout. The post germinates from a discussion about how the motto “Move Fast, Break Stuff” sounds fun for an engineer, but is this attitude apt when it comes to your relationships with your friends and family? As an explicit clarification to the engineers at Facebook: This post is intended to incite thought about attitudes towards privacy models, and not make any claims about coding abilities or the inevitability of bugs. —arnab

 

Mark Zuckerberg’s motto for Facebook, now used as company differentiator in engineering recruiting pitches, is “move fast, break stuff.” As previously reported, Facebook certainly broke things in changes pushed out Tuesday evening: By previewing the effects of your privacy settings, you were briefly able to see your profile as if you were logged in to a friend’s account, which enabled you to view your friends’ live chats as they were taking place, as well as look at pending friend requests.

Tuesday’s changes apparently also broke another privacy setting, though. By now, everyone is aware that Facebook exposes privacy settings for personal information in your profile. This includes items such as your Bio, description, Interested In and Looking For, and Religious and Political Views. However, Tuesday’s changes appear to expose this information to everyone in your network regardless of your privacy settings and even whether or not they are your friend.

[click the pictures to enlarge]
Screen shot 2010-05-06 at 2.27.40 PM

Screen shot 2010-05-06 at 2.27.49 PM

Try it out for yourself. First, set the privacy settings for some of your personal information to exclude certain friends of yours that are in your network, and then preview your profile as them. If the privacy breach hasn’t been fixed yet, your friend will still be able to see your personal information even though they shouldn’t be able to according to your privacy settings. As we mentioned, this extends beyond your friends: anyone in your network may be able to view your personal information (it may even extend beyond your network).

Screen shot 2010-05-06 at 2.27.55 PM

Screen shot 2010-05-06 at 2.28.02 PM

(Note: the privacy leak may have since been fixed… although an awful lot of people now have public quotations on their profiles.)

Unfortunately, it’s unlikely that this bug is going to get the attention that it deserves. Facebook is exposing a privacy policy to its users, but is broken such that it ignores this policy. Upon rolling out Buzz, Google was lambasted in the press for defaulting to a public privacy policy for your contacts – if you opted in to creating a public profile. In this case, Facebook let you set an explicit privacy policy, but then exposed that information anyway.

How could this seemingly minor privacy leak hurt anyone, you might ask? The canonical example of the danger of Buzz’s public contacts was the case of the female blogger with an abusive ex-husband. No harm actually befell this security-conscious blogger, but it certainly could have. In the case of Facebook’s privacy breach, the information that was made public was only profile information relating to your biography, religion and romantic preferences. Given the masses of Facebook users, how many people’s sexual preferences could have been inadvertently outed? How many people could have had potentially embarrassing biography information exposed to their parents, people in their network, or potential employers? The privacy safeguards are there for a reason, after all.

One might be inclined to write it off as a mistake, potentially a bug in a PHP script written by a junior software engineer — something hard to believe, given the reported talent of their employees. But Facebook’s motto, and their current agenda, makes it clear that the privacy leaks that have come to light this week are more than that. They are a product of corporate indifference to privacy; indeed, Facebook’s corporate strategy for monetizing their site depends on making as much of your information public as they can. The EFF has repeatedly sounded alarms about the erosion of privacy on Facebook, but is it too late?

Much of the information that was once personal and guarded by privacy settings has now migrated to the public portion of the site, and has been standardized in order to facilitate companies using your personal information to tie in to their marketing and advertising campaigns. The books that you like, the music that you listen to, your favorite movies: all of these are valuable data that companies will pay Facebook for, in aggregate. It will allow them to target you more specifically. When you expose this information publicly, though, are you really aware of how it will be used – not just today, but tomorrow? Information will persist forever in Facebook’s databases, long after you delete it from your profile.

In the meantime, Facebook’s corporate attitude of playing fast and loose with your profile information makes it likely that future privacy leaks will occur — that is, if any of your profile information remains private for much longer.

Visualizations for Navigation : Experiments on my blog

This is a meta post describing two features on this blog that I don’t think I’ve documented before. Apologies for the navel-gazing, I hope there’s enough useful information here to make it worth reading

Most folks read my blog through the RSS feed, but those who peruse the web version get to see many different forms of navigational aids to help the user around the website. Since the blog runs on Drupal , I get to deploy all sorts of fun stuff. One example is the Similar Entries module, that uses MySQL’s FULLTEXT similarity to show possibly related posts1. This allows you to jump around on the website reading posts similar to each other, which is especially useful for readers who come in from a search engine result page. For example, they may come in looking for Magic Bus for the iPhone , but given that they’re probable iPhone users, they may be interested in the amusing DIY iPhone Speakers post.

The Timeline Footer

However, given that this blog has amassed about a thousand posts over seven years now, it becomes hard to expose an “overview” of that much information to the reader in a concise manner. Serendipitous browsing can only go so far. Since this is a personal blog, it is interesting to appreciate the chronological aspect of posts. Many blogs have a “calendar archive” to do this, but somehow I find them unappealing; they occupy too much screen space for the amount of information they deliver. My answer to this is a chronological histogram, which shows the frequency of posts over time:

Each bar represents the number of blog posts I posted that month, starting from August 2002 until now2. Moving your mouse over each bar tells you which month it is. This visualization presents many interesting bits of information. On a personal note, it clearly represents many stages of my life. June of 2005 was a great month for my blog — it had the highest number of posts, possibly related to the fact that I had just moved to Bangalore, a city with and active Blogging community. There are noticeable dips that reflect extended periods of travel and bigger projects.

In the background, this is all done by a simple SELECT COUNT(*) FROM nodes GROUP BY month type query. Some smoothing is applied to the counts due to the high variance, for my usage, Height = Log base 4 (frequency) gave me pretty good results. This goes into a PHP block, which is then displayed at the footer of every blog page. The Drupal PHP snippets section is a great place to start to do things like this. Note that the chart is pure HTML / CSS; there is no Javascript involved3.

The Dot Header

Many of my posts are manually categorized using Drupal’s excellent taxonomy system. A traditional solution to this is to create sections, so that the user can easily browse through all my Poems or my nerdy posts. The problem is that this blog contains notes and links to things that I think are “interesting”, a classification that has constantly evolved as my interests have changed over the past decade. Not only is it hard for me to box myself into a fixed set of categories, maintaining the evolution of these categories across 7+ years is not something I want to deal with every day.

This is where tags and automatic term extraction come in. As you can see in the top footer of the blog mainpage , each dot is a topic, automatically extracted from all posts on the website. I list the top 60 topics in alphabetical order, where each topic is also a valid taxonomy term. The aesthetics are inspired by the RaphaelJS dots demo, but just like the previous visualization, it is done using pure CSS + HTML. The size and color of the dot is based on the number of items that contain that term. Hovering over each dot gives you the label and count for that dot, clicking them takes you to an index of posts with that term. This gives me a concise and maintainable way to tell the user what kinds of things I write about. It also addresses a problem that a lot of my readers have — they either care only about the tech-related posts (click on the biggest purple dot!), or only about the non-tech posts (look for the “poetry” dot in the last row!).

This visualization works by first automatically extracting terms from each post. This is done using the OpenCalais module (I used to previously use Yahoo’s Term Extractor, but switched since it seems Yahoo!‘s extractor is scheduled to be decommissioned soon). The visualization is updated constantly using a cached GROUP BY block similar to the previous visualization, this time grouped on the taxnomy term. This lets me add new posts as often as I like, tags are automatically generated and are reflected in the visualization without me having to do anything.

So that’s it, two simple graphical ways to represent content. I know that the two visualizations aren’t the best thing since sliced bread and probably wont solve World Peace, but it’s an attempt to encourage discoverability of content on the site. Comments are welcome!


Footnotes:

1 I actually created that module (and the CAPTCHA module) over four years ago; they’ve been maintained and overhauled by other good folks since.

2 Arnab’s World is older than that (possibly 1997 — hence the childish name!), but that’s the oldest blog post I could recover.

3 I have nothing against Javascript, it’s just that CSS tends to be easier to manage and usually more responsive. Also, the HTML generated is probably not valid and is SUPER inefficient + ugly. Hopefully I will have time to clean this up sometime in the future.

obama, mccain and the paris hilton angle

A real, official advert by John McCain calling Obama a “celebrity”:

The response by Paris Hilton:

|

reaching out

  • Russel Davies painted his laptop to work as a blackboard. I think the acrylic casing for the iBook makes an excellent whiteboard too.
  • Friend and mentor Cong Yu just got an honorable mention in the SIGMOD Dissertation Award:
    …Two other nominees receive Honorable Mention recognizing their outstanding work on theoretical foundations and development of algorithms with great impact on important practical problems: Cong Yu, for his dissertation on “Managing Complex Databases in a Schema Management Framework” at the University of Michigan, and, Nilesh Dalvi, for his dissertation on “Managing Uncertainty Using Probabilistic Databases” at the University of Washington.

    It’s interesting to see the hiring trends : the Award was won by now-MSR researcher Ariel Fuxman. Nilesh and Cong are both Yahoo! Researchers.

  • Natalie Du Toit’s “wonderful story of courage, determination, discipline, hopes and dreams”, becoming the first amputee to qualify for the Olympics.
  • It’s not every day that the White House asks you to become a Vampire Slayer.

bags, balls and boyfriends

Links today brought to you by Red Bull™, my abusive friend in a can pushing me through a rather crazy day.

  • Lego Schoolbag : If I was a 10 year old girl, I would give away my younger brother for this one.
  • Every expression in this picture is priceless. I like how our hero has resigned to prayer.
  • A hilarious sketch from Snuff Box, starring Matt Berry, who also stars in the hilarious britcom The IT Crowd.
  • Adobe is opening up the SWF and FLV formats with the Open Screen project. (No sir, this is not about Single White Females or Fine Looking Virgins.) Flash has been sort of open for a while, with projects like SWFTools and GNash, but this takes things to a whole new level, with a slew of bigwig corporate backers. Flash and FLV have been in my opinion the critical enablers to the online video revolution; and this is definitely a great step ahead. I’m curious to know what Microsoft’s Silverlight team is thinking, as well as the folks at Sun (who just opened up all of Java). And of course, let’s not to forget the Android folks who have a very pretty stack, but tacking on some Flash magic would definitely be a very big deal. Considering the significant overlap between supporters of the Adobe effort and the Google effort, this is going to be fun to watch.

April Fools!

This April Fools’ day, certain perpetrators took upon the task of magically changing YY’s desk into a silvery workspace. Everything, from the LCD screen to the pens, books, and even slippers were “foiled”. Here are some before and after pics:

| |

april fools in december?

From a /. comment:

What have we seen recently?

I’m not sure what the odds are of all these impossibilities happening on the same day. Something’s up, I tell ya.

|

stupid people everywhere

It’s interesting how you get to see so many relationships start in Winter, take shape around the end of it, flourish around the beginning of spring, and then spectacularly fall on their face and come crashing to a halt in a few weeks. Just when the weather is most perfect — sunny skies, cool breeze, fresh leaves on trees. What a waste, depressed people walking around with that “this-world-sucks” glaze in their eyes, in such misery that God must be wondering why he’s wasting something as beautiful as Spring on stupid people like these. It’s all about timing, folks! Get hitched in winter, have the most awesome spring of your life, get all bitchy and irritated in the sweltering hot summer, which would give you a great mindset to scream and yell and break up, celebrate the single-ness in Fall, and then spend all of the dull, gloomy pre-winter months being cynical and conniving, planning the ex-significant-other’s murder, while at the same time looking for a replacement. Lather rinse repeat. Not a dull day in your life anymore!

note to self: set up the “Top 40 Breakup songs of all time” playlist. A lot of people could use it right now.

|

goodesktop 3

If I were you, I would stay away from Google Desktop 3. The EFF’s worries are a little far fetched, but not unplausible at all. To be very honest, this combined with personalized search, Google Talk archiving, and GMail Talk are really worrying me. It’s like incubating a dinosaur egg. It might be a brontosaur, but it might also be a T-Rex.

| |