Microsoft Research's Data-related Launches

Microsoft Research has been making a bunch of cool data analysis-related launches at the upcoming Faculty Summit.

First, there’s The academic release of Dryad and DryadLINQ

Dryad is a high-performance, general-purpose, distributed-computing engine that simplifies the task of implementing distributed applications on clusters of computers running a Windows® operating system. DryadLINQ enables developers to implement Dryad applications in managed code by using an extended version of the LINQ programming model and API. The academic release of Dryad and DryadLINQ provides the software necessary to develop DryadLINQ applications and to run them on a Windows HPC Server 2008 cluster. The academic release includes documentation and code samples.

They also launched Project Trident , a workflow workbench, which is available for download:

Project Trident: A Scientific Workflow Workbench is a set of tools—based on the Windows Workflow Foundation—for creating and running data analysis workflows. It addresses scientists’ need for a flexible and powerful way to analyze large and diverse datasets, and share their results. Trident Management Studio provides graphical tools for running, managing, and sharing workflows. It manages the Trident Registry, schedules workflow jobs, and monitors local or remote workflow execution. For large data sets, Trident can run multiple workflows in parallel on a Windows HPC Server 2008 cluster. Trident provides a framework to add runtime services and comes with services such as provenance and workflow monitoring. The Trident security model supports users and roles that allows scientists to control access rights to their workflows.

Then there’s Graywolf :

GrayWulf builds on the work of Jim Gray, a Microsoft Research scientist and pioneer in database and transaction processing research. It also pays homage to Beowulf, the original computer cluster developed at NASA using “off-the-shelf” computer hardware.

how many computers does google have?

One of the first things I did outside of work at Google was to find out how many computers the company has. It’s a fairly secret number; it’s not quite a topic that people in the Googz like to talk about.

It took me a week to piece together the answer; and a few months to come to terms with my discovery. It’s hard to talk to people outside of the big G about the kind of stuff they pull off there, and I’m not talking about making ball pits out of director’s offices.

I can finally talk about this, now that this information is explicitly public, published in an article by MapReduce Gods Jeff Dean and Sanjay Ghemawat (bloggy synopsis here). In the paper, they talk of 11,081 machine years of computation used in Sept 2007 alone, for a subset of their MapReduce work. That’s 132972 machine months of CPU used in one month. Assuming all the computers were running at 100% capacity, without failure, without any break for the entire month, that’s almost a hundred and fifty thousand machines worth of computing used in September Oh Seven.

In other words, Google has about one hundred and fifty thousand computers that are reported here.

But does that account for ALL the computers at Google?

To find out, go ask a Google employee to violate his NDA today!

for your information, this may not be the right number. it should be obvious why. for example, they never said anything about not using hamsters. hamsters are 10x faster than computers, which would mean they could just have 10,000 hamsters and it would be fine.

Microsoft commercializes Surface Computing

All this stuff has been around for a while, but MS is taking the bold step to commercialize it. The new product is called Microsoft Surface. Pretty slick:

The videos on the MS Surface homepage are also worth watching, though do have an overzealously awesome attitude.

Note though that none of this is new technology — it’s just that a mainstream software company has decided to convert established ideas into a mainstream product. Here’s a set of videos of other surface computing projects:

The BumpTop Project at UToronto does file management using a surface.

TouchLight by MSR

MultiTouch Displays by Jeff Han, and his spin-off Perceptive Pixel

Frustrated Total Internal Reflection — the technology that powers most of these interfaces

Reactable from Universitat Pompeu Fabra. MS totally stole their “shapes” idea.

And last, but definitely not least, is my entry to this game :)

Last winter I took a class with Prof. Michael Rodemer called “Interactivity and Behaviour”. I tried to build a tabletop that reacts to where you touch it, changing lights and modifying the music that it plays. The video is a little lame, but it was fun to build the damn thing!

computing history browser

A neat proposal on the Ubuntu wiki about a Computing History Browser.