machine learning

Friend-based throttling in Facebook News Feeds

This dialog in my Facebook feed options seemed interesting:

Screen shot 2010-08-13 at 4.16.45 AM

Notice how it asks me how many friends I want my Live Feed from. It seems the default is 250 friends. What this means is that when you click “Recent Posts”, you’re getting recent posts from only your top 250 friends; all other friends are being ignored.

Obviously this is a problem only if you have more than 250 friends. I’ve heard the average is 150, but I’m sure there are a lot of people who are affected by this. This option caught my eye for two reasons:

From a technical perspective, news feeds are massive publish-subscribe systems. You subscribe to your friends’ posts, which when posted, are published to your feed. The 250 friend limit sets up a convenient soft limit for the system, reducing the stress on Facebook’s servers. Twitter doesn’t have such limits, and I can imagine this is one reason why its servers get overloaded. It’s a smart design from this perspective, but I wish Facebook was more transparent about the limit!

From a social perspective, I think this is a very primitive way to throttle friends. My understanding of the Feed was that my “Top Posts” ranked recent posts so that I had a high-level view of my feed, and “Recent Posts” gave me access to everything. It seems this belief is incorrect. When I increased this number to 1000(i.e. include ALL my friends), I suddenly started seeing updates from many friends I had totally forgotten about / lost touch with. Since I don’t see updates from them, I don’t interact with them on Facebook, leading to a self-reinforcing “poor get poorer” effect. I am assuming there’s some “Friendness” ranking going on here. This way, friends in my bottom 50 will never make it to my top 250 friends on Facebook. The use of a self-reinforcing ranking function is risky; especially when the stability of the ranking depends on human input. I wonder if the Feed team has done anything smart to introduce “compensators” based on interactions with bottom 50-friends, similar to the random reset in PageRank. The issue here is that unlike hyperlink edges, we’re dealing with a vocabulary of “Likes” and other social cues which are not well understood. It seems like this can be an excellent subject for a machine learning / information retrieval paper or two.

update: Horseman of the Interwebs Hung Truong points out Dunbar’s Number:

Dunbar’s number is a theoretical cognitive limit to the number of people with whom one can maintain stable social relationships. These are relationships in which an individual knows who each person is, and how each person relates to every other person. Proponents assert that numbers larger than this generally require more restrictive rules, laws, and enforced norms to maintain a stable, cohesive group. No precise value has been proposed for Dunbar’s number. It lies between 100 and 230, but a commonly detected value is 150.

This puts Facebook’s default threshold at a great place. However, Dunbar’s numbers are meant for offline relationships, i.e. the Dunbar number for ephemeral, online “feed” style relationship could arguably be much higher. It appears Dunbar has been working on this , I’m looking forward to a publication from his group soon.

on research blogging

The Machine Learning weblog wonders if it’s worth it.


attack of the clones

We had the Daypop 40, and Blogdex, and then PopDex came along. You thought it was over, and people would be content with all this, right?

Wrong. Welcome to BlogPulse - Automated Trend Discovery for Weblogs:

BlogPulse Key Phrases and BlogPulse Key People are mined daily from new entries in over 40,000 weblogs using machine learning algorithms and natural language processing techniques. BlogPulse Top Links are the most popular links appearing in weblogs today.

And guess what? This is a project by Intelliseek, a company with investors like Nokia, Lycos, Ford and Chrysalis. [note to self: start a blog company, that's where the money really is.]