php

"Move Fast, Break Trust?"

This week’s blog post is written by fellow PhD Candidate Nicholas Gorski, who came across yet another bug in Facebook’s privacy during the latest rollout. The post germinates from a discussion about how the motto “Move Fast, Break Stuff” sounds fun for an engineer, but is this attitude apt when it comes to your relationships with your friends and family? As an explicit clarification to the engineers at Facebook: This post is intended to incite thought about attitudes towards privacy models, and not make any claims about coding abilities or the inevitability of bugs. —arnab

 

Mark Zuckerberg’s motto for Facebook, now used as company differentiator in engineering recruiting pitches, is “move fast, break stuff.” As previously reported, Facebook certainly broke things in changes pushed out Tuesday evening: By previewing the effects of your privacy settings, you were briefly able to see your profile as if you were logged in to a friend’s account, which enabled you to view your friends’ live chats as they were taking place, as well as look at pending friend requests.

Tuesday’s changes apparently also broke another privacy setting, though. By now, everyone is aware that Facebook exposes privacy settings for personal information in your profile. This includes items such as your Bio, description, Interested In and Looking For, and Religious and Political Views. However, Tuesday’s changes appear to expose this information to everyone in your network regardless of your privacy settings and even whether or not they are your friend.

[click the pictures to enlarge]
Screen shot 2010-05-06 at 2.27.40 PM

Screen shot 2010-05-06 at 2.27.49 PM

Try it out for yourself. First, set the privacy settings for some of your personal information to exclude certain friends of yours that are in your network, and then preview your profile as them. If the privacy breach hasn’t been fixed yet, your friend will still be able to see your personal information even though they shouldn’t be able to according to your privacy settings. As we mentioned, this extends beyond your friends: anyone in your network may be able to view your personal information (it may even extend beyond your network).

Screen shot 2010-05-06 at 2.27.55 PM

Screen shot 2010-05-06 at 2.28.02 PM

(Note: the privacy leak may have since been fixed… although an awful lot of people now have public quotations on their profiles.)

Unfortunately, it’s unlikely that this bug is going to get the attention that it deserves. Facebook is exposing a privacy policy to its users, but is broken such that it ignores this policy. Upon rolling out Buzz, Google was lambasted in the press for defaulting to a public privacy policy for your contacts – if you opted in to creating a public profile. In this case, Facebook let you set an explicit privacy policy, but then exposed that information anyway.

How could this seemingly minor privacy leak hurt anyone, you might ask? The canonical example of the danger of Buzz’s public contacts was the case of the female blogger with an abusive ex-husband. No harm actually befell this security-conscious blogger, but it certainly could have. In the case of Facebook’s privacy breach, the information that was made public was only profile information relating to your biography, religion and romantic preferences. Given the masses of Facebook users, how many people’s sexual preferences could have been inadvertently outed? How many people could have had potentially embarrassing biography information exposed to their parents, people in their network, or potential employers? The privacy safeguards are there for a reason, after all.

One might be inclined to write it off as a mistake, potentially a bug in a PHP script written by a junior software engineer — something hard to believe, given the reported talent of their employees. But Facebook’s motto, and their current agenda, makes it clear that the privacy leaks that have come to light this week are more than that. They are a product of corporate indifference to privacy; indeed, Facebook’s corporate strategy for monetizing their site depends on making as much of your information public as they can. The EFF has repeatedly sounded alarms about the erosion of privacy on Facebook, but is it too late?

Much of the information that was once personal and guarded by privacy settings has now migrated to the public portion of the site, and has been standardized in order to facilitate companies using your personal information to tie in to their marketing and advertising campaigns. The books that you like, the music that you listen to, your favorite movies: all of these are valuable data that companies will pay Facebook for, in aggregate. It will allow them to target you more specifically. When you expose this information publicly, though, are you really aware of how it will be used – not just today, but tomorrow? Information will persist forever in Facebook’s databases, long after you delete it from your profile.

In the meantime, Facebook’s corporate attitude of playing fast and loose with your profile information makes it likely that future privacy leaks will occur — that is, if any of your profile information remains private for much longer.

Visualizations for Navigation : Experiments on my blog

This is a meta post describing two features on this blog that I don’t think I’ve documented before. Apologies for the navel-gazing, I hope there’s enough useful information here to make it worth reading

Most folks read my blog through the RSS feed, but those who peruse the web version get to see many different forms of navigational aids to help the user around the website. Since the blog runs on Drupal , I get to deploy all sorts of fun stuff. One example is the Similar Entries module, that uses MySQL’s FULLTEXT similarity to show possibly related posts1. This allows you to jump around on the website reading posts similar to each other, which is especially useful for readers who come in from a search engine result page. For example, they may come in looking for Magic Bus for the iPhone , but given that they’re probable iPhone users, they may be interested in the amusing DIY iPhone Speakers post.

The Timeline Footer

However, given that this blog has amassed about a thousand posts over seven years now, it becomes hard to expose an “overview” of that much information to the reader in a concise manner. Serendipitous browsing can only go so far. Since this is a personal blog, it is interesting to appreciate the chronological aspect of posts. Many blogs have a “calendar archive” to do this, but somehow I find them unappealing; they occupy too much screen space for the amount of information they deliver. My answer to this is a chronological histogram, which shows the frequency of posts over time:

Each bar represents the number of blog posts I posted that month, starting from August 2002 until now2. Moving your mouse over each bar tells you which month it is. This visualization presents many interesting bits of information. On a personal note, it clearly represents many stages of my life. June of 2005 was a great month for my blog — it had the highest number of posts, possibly related to the fact that I had just moved to Bangalore, a city with and active Blogging community. There are noticeable dips that reflect extended periods of travel and bigger projects.

In the background, this is all done by a simple SELECT COUNT(*) FROM nodes GROUP BY month type query. Some smoothing is applied to the counts due to the high variance, for my usage, Height = Log base 4 (frequency) gave me pretty good results. This goes into a PHP block, which is then displayed at the footer of every blog page. The Drupal PHP snippets section is a great place to start to do things like this. Note that the chart is pure HTML / CSS; there is no Javascript involved3.

The Dot Header

Many of my posts are manually categorized using Drupal’s excellent taxonomy system. A traditional solution to this is to create sections, so that the user can easily browse through all my Poems or my nerdy posts. The problem is that this blog contains notes and links to things that I think are “interesting”, a classification that has constantly evolved as my interests have changed over the past decade. Not only is it hard for me to box myself into a fixed set of categories, maintaining the evolution of these categories across 7+ years is not something I want to deal with every day.

This is where tags and automatic term extraction come in. As you can see in the top footer of the blog mainpage , each dot is a topic, automatically extracted from all posts on the website. I list the top 60 topics in alphabetical order, where each topic is also a valid taxonomy term. The aesthetics are inspired by the RaphaelJS dots demo, but just like the previous visualization, it is done using pure CSS + HTML. The size and color of the dot is based on the number of items that contain that term. Hovering over each dot gives you the label and count for that dot, clicking them takes you to an index of posts with that term. This gives me a concise and maintainable way to tell the user what kinds of things I write about. It also addresses a problem that a lot of my readers have — they either care only about the tech-related posts (click on the biggest purple dot!), or only about the non-tech posts (look for the “poetry” dot in the last row!).

This visualization works by first automatically extracting terms from each post. This is done using the OpenCalais module (I used to previously use Yahoo’s Term Extractor, but switched since it seems Yahoo!‘s extractor is scheduled to be decommissioned soon). The visualization is updated constantly using a cached GROUP BY block similar to the previous visualization, this time grouped on the taxnomy term. This lets me add new posts as often as I like, tags are automatically generated and are reflected in the visualization without me having to do anything.

So that’s it, two simple graphical ways to represent content. I know that the two visualizations aren’t the best thing since sliced bread and probably wont solve World Peace, but it’s an attempt to encourage discoverability of content on the site. Comments are welcome!


Footnotes:

1 I actually created that module (and the CAPTCHA module) over four years ago; they’ve been maintained and overhauled by other good folks since.

2 Arnab’s World is older than that (possibly 1997 — hence the childish name!), but that’s the oldest blog post I could recover.

3 I have nothing against Javascript, it’s just that CSS tends to be easier to manage and usually more responsive. Also, the HTML generated is probably not valid and is SUPER inefficient + ugly. Hopefully I will have time to clean this up sometime in the future.

y!Vmail - voice mail for your Yahoo! Mail

Yesterday Dan, Pradeep and I presented “y!Vmail: voicemail for your Yahoo! Mail” at the Yahoo! University Hack Day Contest, winning the award for the 2nd best Hack! (jump to the demo video )


Our team with judges Paul Tarjan and Rasmus Lerdorf

The adventure started when I heard about Yahoo!‘s Hack U event:

Join Yahoo! web experts including Rasmus Lerdorf, the creator of PHP, for a week of learning, hacking and fun! You’ll hear interesting tech talks, hacking tips and lessons, and get hands-on coding workshops where you’ll work with cutting-edge technology. The week’s events will culminate with our University Hack Day competition—a day-long festival of coding, camaraderie, demos, awards, food, music and jollity (it’s a real word, look it up).

Years ago when I was in my teens, I was an avid participant on the school / college tech fest circuit. Almost every major institution in and around Delhi would organize annual technical festivals, hosting programming contests and software demo competitions. This was where I got a chance to showcase my creations and meet other hackers. Winning these events became a good way for me to pay off those telephone bills — web development in the dial-up age was an expensive hobby!

I decided to enter the Hack Day contest just for fun; it had been a while since I participated in one of these. It wasn’t about winning this time; I just wanted to do the whole “idea to execution to demo” thing with a group of friends, and spend hours screaming at each other over STUPID hard-to-find bugs that are actually staring at you in the face, high-fiving every hour as a feature milestone was scratched off the todo-list. The reward: to be able to stand in front of a group of people and say “Hey guys, look what I made!.” (If it’s hard to appreciate what this feels like, this video might help.)


Yahoo! gave away a bunch of t-shirts, this was on one of them

3 days before the Hack Day, I had an idea about building a phone-based interface for email. The idea was simple enough to build in a day, but fun enough to make an enjoyable demo. The only problem: I was already in the midst of a “hack” daymonth of my own; VLDB was due 3 hours before the start of the Hack Day, and I was already sacrificing sleep for LaTeX and Python for more than a week. There was no way I was going to be able to do this alone. Enter fellow grad students Dan and Pradeep. I told them about the contest and my idea. While they are both expert hackers, I totally forgot about the fact that people in Operating Systems research don’t really do a lot of Web Programming: “PHP….? I’ve never…” said Dan. I pointed them to the Yahoo Developer Network site and returned to my research paper writing madness. Hopefully by Friday evening, I would have a web-savvy hack team.

On Friday, I took a quick nap after my paper deadline, and walked over to the Hack Fest area to meet my team (who had become PHP and telephony wizards by now) and load up on caffeine and sugar that the Yahoo! folks had set up for us.


They even had my favorite candy !

We split the work into two parts; Dan would build the phone interface while Pradeep and I would figure out the email and contacts API to write an email client backend. 7 hours later, we had the first version of our product up and running. We could call in and read emails. Happy with our progress, we decided that it would be wiser to go home and show up early next day. We ended up wasting a few hours the next morning worrying about the presentation: the lecture hall had spotty cellphone coverage, a deal-killer for a phone demo! Pradeep made a breakthrough here, discovering that an obscure panel on the wall was actually a secret speakerphone. Having resolved demo issues, we resumed coding and plugged in the remaining features: navigating through emails, email summarization, and email prioritization. The friendly timestamps feature (“4 minutes ago”) was stolen from my blog’s code (i.e. the Status header of this blog).

Around 3:30pm on Saturday, we updated our hackday entry:

y!Vmail

by Arnab Nandi, Daniel Peek, Pradeep Padala

“Not everyone has a computer, but everyone has a phone.”

This hack allows people to access their Yahoo! mail through a 1-800 number, using ANY touch-tone phone.
Press 0 to open, * and # to navigate, 7 to delete. We figure out which emails are important, and read them first. We summarize long emails so that you dont have to listen to all of it. If you want to talk to the person, just press 5 — we’ll connect you.

APIs used: BBAuth, OpenMail, Contacts API, Term Extraction API

Hack presentations started at 4:00pm on Saturday. I started with a 20-second powerpoint pitch, followed by a rather entertaining demo. Using the lecture hall’s speakerphone we had the lecture hall call our service. Entering the correct PIN logged me in, which resulted in an entire roomful of people were now hearing the words “Welcome to y!Vmail. You have 5 new emails…”


Me pushing numbers on the phone


Here’s a short video walk through of our app:

More details at http://yvmail.info

A few minutes after the presentation ended, the prizes were announced. We ranked second. The winning hack was Brandon Kwaselow’s “Points of WOE”; a native iPhone app that allowed browsing and creation of placemarks on Yahoo! Maps. Congratulations, Brandon!

Overall, this was a very exciting and enjoyable event; I had a rocking good time hanging out with the Yahoo! folks and getting a cool project out the door with around 15 hours of work. I end with some lessons, acquired over years of doing demo contests:

  • Be creative, but avoid feature creep.
  • Split up into sub-teams, but make sure you’re pair programming most of the time.
  • Get Version 0 done Super Super Early. Then polish, polish, polish.
  • Reuse (with attribution) as much code as you can.
  • Take lots of breaks, make friends, and have fun.

Image credits: Rasmus, Erik
Shout outs: Folks at Twilio for making the coolest telephony API in the universe!

Getting django-auth-openid to work with Google Accounts

update: This blog post is meant for older versions of django-authopenid. The latest version available at pypi has implemented a fix similar to this one, and hence works out of the box, you wont need this fix.
Thanks to Mike Huynh for pointing this out!

I've been playing with Django over the past few days, and it's been an interesting ride. For a person who really likes PHP's shared- nothing, file-based system model (I'm mostly a drupal guy), Django comes across as overengineered at first, but I'm beginning to see why it's done that way.

I was trying to get single-signon working, and settled on django-authopenid over the other django openid libraries, django-openid, django-openid-auth and django-oauth. It was easy to use and understand, and wasn't seven million lines of code.

My intention was to use the OpenID extension to get the user's email address during the sign on process. However, it doesn't seem to work with Google's OpenID implementation, because Google uses the an Attribute Exchange (ax) extension instead of the Simple Registration (sreg) OpenID extension that is implemented in the library. A quick hack to django-authopenid's views.py makes it work:


51c51
- from openid.extensions import sreg
---
+ from openid.extensions import ax
94c82
- sreg_request=None):
---
+ ext_request=None):
113,114c101,102
- if sreg_request:
- auth_request.addExtension(sreg_request)
---
+ if ext_request:
+ auth_request.addExtension(ext_request)
195,210c172,185
- sreg_req = sreg.SRegRequest(optional=['nickname', 'email'])
- redirect_to = "%s%s?%s" % (
- get_url_host(request),
- reverse('user_complete_signin'),
- urllib.urlencode({'next':next})
- )
-
- return ask_openid(request,
- form_signin.cleaned_data['openid_url'],
- redirect_to,
- on_failure=signin_failure,
- sreg_request=sreg_req)
---
+ ax_req = ax.FetchRequest()
+ ax_req.add(ax.AttrInfo('http://schema.openid.net/contact/email', alias='email',required=True))
+ redirect_to = "%s%s?%s" % (
+ get_url_host(request),
+ reverse('user_complete_signin'),
+ urllib.urlencode({'next':next})
+ )
+
+ return ask_openid(request,
+ form_signin.cleaned_data['openid_url'],
+ redirect_to,
+ on_failure=signin_failure,
+ ext_request=ax_req)

Obviously this is a very cursory edit. I'm too lazy to improve and submit this as a patch, so readers are encouraged to submit it to all relevant projects!

|

multisite drupal: the importance of the sequence

Recent versions of Drupal have the oh-so-cool feature that allows you to host many websites off a single Drupal codebase. The coolest part about this is that you can share some tables accross multiple websites; which means you can do things like have a single username/password table accross all the websites. This can easily be done, as specified in the settings.php comments as:

* $db_prefix = array( * 'default' => 'main_', * 'users' => 'shared_', * 'sessions' => 'shared_', * 'role' => 'shared_', * 'authmap' => 'shared_', * 'sequences' => 'shared_', * );

Now here’s an important thing to note: The first table you have to share is the sequences table. This is the table that handles all the id counters, so if you don’t share this one, something like this can happen:

[you shared only the users table]

1. User 1 signs up on Site A, gets user id#1
2. User 2 signs up on Site A, gets user id#2
3. User 3 signs up on Site B, gets user id#….? The correct answer is not 3!

This happens because you didn’t share sequencesSite B uses it’s own sequence generator to render a duplicate userid… which the user table would not accept, and this would go on till the Site B sequence catches up with the Site A sequence, and then things would be normal. The code quality in user.module helps protect the user table from data corruption, but you will have many signups disappear into thin air with a set up like this. Hence, all you need to do is share the sequences table along with the users… and you’re all set!

Btw, hello Planet people!

|

Drupal downtime

It’s a freak coincidence, Drupal, Drupaldocs and Drupaldevs are all down right now. Note that Drupal.org is down for scheduled maintenance, and has not been hacked or taken down or anything*. They’re changing the powerlines at the server room or something, and it’s just taking a little longer than expected.

Drupal versions <= 4.6.1 (and a bunch of other PHP apps) have a security problem which makes them vulnerable to code injection, which means bad people can do bad people to your website. To solve this, all you need to do is go to drupal.org and download the latest patched version. Since the site is down at the moment, here’s a temporary fix:

1. delete xmlrpc.inc in the includes directory.
2. upload a blank file in its place.

This should keep you safe from attacks, but will disable the weblogs.com, etc. “ping” notification, and the blogapi. You can later update the files when drupal.org is back up.

|

Writing a simple guestbook script using DBX

Personal Homepages often contain a small section called a guestbook - a place where people can come and write stuff about how they liked the website, or just to let the person know that they've visited the page. And mostly, these pages are the ones with very less traffic.

| |

Tag Soup

Here's a List of all the tags(categories, labels, whatever you call them) used at arnab.org:

captchas and racism

From #drupal:
arnab: and the fact that captchas are, well, stupid
chx: yes, visual captchas are stupid
chx: I think the textual ones are better
chx: if you REALLY want some captcha then something textual
UnConeD: Welcome to my site! To register, please answer the following captcha!
UnConeD: What is the 312455th digit of Pi, in base 42?
chx: UnConeD: LOL
arnab: heh
arnab: exactly my point.
chx: rather “what is the eleventh letter in this sentence?”
UnConeD: well
UnConeD: that sort of stuff is easily cracked with regexps
UnConeD: some guy once made a math expression captcha in text form
UnConeD: in a patch to the module
chx: yes yes
UnConeD: i followed up the issue with a PHP script to break his code ;)
chx: I liked that one
chx: Well I think I can rather easily make a textual captcha you won’t be able to script
arnab: chx: make one, I’ll crack it :D
UnConeD: you are no match for my dangerous RegExping skills
***UnConeD casts Capturing Parentheses (opponent’s movement reduced by 50%)
chx: UnConeD: beware, I’ll grep the CIA World Facts book and ask questions based on that and you can eat your regexps.
UnConeD: err
UnConeD: but grep is itself regexp based :P
chx: I mean, I’ll compile a huge list of facts based on World Facts wikipedia whatever
arnab: chx: I have an indexed, parsed dump of Wikipedia on my HDD, will break your CIA thingy in 2 minutes with it
chx: and questions like “Is Ghana in Africa?”
UnConeD: i’ll hire an indian fellow with an encyclopedia
arnab: UnConeD: I AM an Indian fellow with an encyclopedia
arnab: rofl
UnConeD: ;)

hmm

MSNSearch tells the world that I'm the seventh most important singlegirl around:

7. arnab's world :: weblog
... Sex&SingleGirl. I am a neurotic sex goddess ...
arnab.org/blog

Others disagree. Talk about a twisted sense of perception.

| |