Archive

October 28th, 2013

osu::hackathon(2013)

 

Hackathon starts Nov 2 at 3:00pm at 18th Ave. Library

| |

October 22nd

Announcing the OSU Hackathon!

Update: The 2013 Hackathon was an amazing success with over 100 students participating! More details our report about it. The event was also covered by The Dispatch and The Lantern .

| |

August 16th

High-end Computing Systems Group Faculty

 

Systems Seminar Fridays at 3:30pm in DL 480

Gagan Agrawal
High Performance, Data-Intensive, and Cloud Computing, Middleware/Compilers, Data/Web Mining

Srini Parthasarathy
Data Mining, Parallel & Distributed Computing Systems, Network Science at Scale (Social and Biological Networks)

Arnab Nandi
Database Systems, Large-scale Analytics, Data Interaction

P. (Saday) Sadayappan
Performance Optimization, Compilers/Runtime Systems

Christopher Stewart
Power management, Data-intensive services, Empirical studies

Spyros Blanas
Database Systems, Main Memory Architectures

Radu Teodorescu
Computer Architecture, Energy-efficient computing, Reliability

Xiaodong Zhang
Distributed Systems, Memory Systems (also Networking)

D. K. Panda
Architecture, Networking, Programming Models, Accelerators, Cloud Computing, File systems and Storage, and Power-Aware Designs, Big Data

Feng Qin
Operating Systems, System Security and Dependability, Cloud Computing, and Mobile Computing

|

So I Suck At 24: Automating Card Games Using OpenCV and Python

Disclaimer: My area of research is databases. This post is about computer vision, something that I am not an expert at. Please consider this an amateur for-fun-only post!
The card recognition code, training deck image and test images are available on Github.

A friend introduced me to the game 24, a simple game you can play with a deck of playing cards. Over the course of playing the game, I realized that either my friend was really good at this game, or that I wasn't!

Given how simple the game was, I figured that I could write a program that was good at it. Scoping the problem out, it was a perfect weekend hack project, and a good break from all the grant writing, research and administrivia of my job. Also, I have always been fascinated by all the cool things my colleagues in Computer Vision do, so it was a good way to learn about some of the fundamentals.

The Game

Here's a description of the game, from the Wikipedia article:

The 24 Game is an arithmetical card game in which the object is to find a way to manipulate four integers so that the end result is 24. Addition, subtraction, multiplication, or division, and sometimes other operations, may be used to make four digits from one to nine equal 24. For example, card with the numbers 4,7,8,8, a possible solution is the following: (7-(8/8))*4=24.

The game is fun to play 1-on-1, where the first person to come up with a valid expression gets to keep the cards. Once the deck is exhausted, the person with the most cards is declared the winner.

An Automatic Solution

My vision for an automated version of the game was simple. Players sit across a table on which the cards are laid out. My program would take a picture of the cards and recognize them. It would then generate valid expression that yielded 24, and then project the answer on to the table. Before going into the details, here's a video of how the system would work:

Parts of the Problem

Let's think through the system and break it down into parts. The heart of the problem is to search for a valid expression that yields 24. To get the digits, we have to look at the cards using a camera, and identify that they are cards. Then, we need to process each card and recognize what the digits are. Then, once we get the expression, we project them onto the table surface using a projector.

Searching for Expressions:

Before getting into the computer vision and image processing bits, let's get the math out of the way. Consider the following set of cards. The digits here are: 3,7,8,9.

For these 4 digits, the search space (i.e. the set of all the possible candidates) can be enumerated as:

  • All the permutations of the digits (e.g. [3,7,8,9], [9,8,7,3] and so on)
  • All possible operators insertions for each permutation (e.g. [3 + 7 + 8 + 9], [3 * 7 - 8 + 9])
  • All possible parenthesizations of each operator configuration (e.g. 3 * [ 7 - 8 + 9 ] )

As it turns out, the number of possible candidates is bounded (for the curious, see Catalan Numbers) and hence tractable to brute force. Optimizations such as memoization (e.g. [3 * (7 + 8 + 9)] and [3 / (7 + 8 + 9)] both have the same subexpression (7 + 8 + 9) which can be computed and cached) and taking into account the associativity of operators (e.g. 3 * [ 7 - 8 + 9 ] does not need a nested parenthesis since - and + are associative.) help reduce the search space, speeding up the computation.

Note that not all digits generate valid 24 expressions, so there is a chance we'll have to say "Expression impossible".

Identifying and Registering Cards

Now that we have the solution, let's turn to the fun part -- seeing the cards! As we can see in the image above, we have a dark background for a tabletop, and the cards are laid out in a 2x2 row. They are however not quite vertically aligned, and the picture itself is at a perspective. Thus, we first need to perform image segmentation to pick out each card by itself, and then image registration to line up the picture of each card with a flat, rectangular representation of a card.

The python API for OpenCV is very easy to use for these tasks, and there's a ton of informative blog posts that help. We first read the picture and preprocess(greyscale, blur and threshold) it to get rid of artifacts so that we can focus on the real stuff:


  im = cv2.imread(filename)
  gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
  blur = cv2.GaussianBlur(gray,(1,1),1000)
  flag, thresh = cv2.threshold(blur, 120, 255, cv2.THRESH_BINARY)

We then find all contours in the image. These can be edges of the cards themselves, or contours of the figures and letters in the cards. Thus, we look for the four contours that span the most area -- which have to be the four cards themselves.


  contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
  contours = sorted(contours, key=cv2.contourArea,reverse=True)[:numcards]  

This takes the original image and identifies the top 4 contours (aka the four cards), which we can then cycle through one by one. Note the grey lines in the following image -- they are the contours identified, which include the symbols on the cards themselves.

Now, for each card, we need to register each card into a rectangular representation. To do this, we identify the rectangle representation of each card. This is done by approximating a polynomial from the contour (which is simply a vector of points) and then finding the minimum rotation-free bounding box:

  
  for i in range(numcards):
    card = contours[i]
    peri = cv2.arcLength(card,True)
    approx = cv2.approxPolyDP(card,0.02*peri,True)
    rect = cv2.minAreaRect(contours[2])
    r = cv2.cv.BoxPoints(rect)

Then, for each rectangle, we perform an Affine Transform to transform the image of each card into a rectangular representation, so that it can be easily recognized:


  h = np.array([ [0,0],[449,0],[449,449],[0,449] ],np.float32)
  transform = cv2.getPerspectiveTransform(approx,h)
  warp = cv2.warpPerspective(im,transform,(450,450))

Recognizing Cards

Now that we have a registered representation of the card, The next step is to recognize it. For this, we use a training deck -- to tell our code what each card looks like in the first place.

We pipe this image through the same code as above, creating registered representations of each card, and then labeling each card by hand. In my code, I created a simple tab-separated-values file that stored the card information in order of how the registered cards were listed by the code:

55  *  *
54  8  S
53  K  S
52  Q  S

Note that "*" is used to represent Joker / other cards, which were also in the deck.

Now that we know what each card looks like, the next step is to match it up with the incoming candidate card. Both are lined up / registered, and both have been through the same preprocessing. There are a lot of robust algorithms to this problem such as SIFT and SURF which can be used for this problem: We could recognize each digit / letter using character recognition, or even recognize the suit symbols on each card and count them to get the digit.

The simplest option here, however, is to simply ask -- how different is the picture of the incoming card from each of the training card? We can simply write this as:


  def preprocess(img):
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray,(5,5),2 )
    thresh = cv2.adaptiveThreshold(blur,255,1,1,11,1)
    blur_thresh = cv2.GaussianBlur(thresh,(5,5),5)
    return blur_thresh
  diff = cv2.absdiff(preprocess(img1),preprocess(img2))  
  diff = cv2.GaussianBlur(diff,(5,5),5)    
  flag, diff = cv2.threshold(diff, 200, 255, cv2.THRESH_BINARY) 
  print np.sum(diff)  

The steps in the above lines of code are: First transform the image to a grayscale colorspace and then blur it to get rid of artifacts that arise from lighting and camera noise. Then, perform an adaptive thresholding -- i.e. highlight differences in black / white, which articulates the symbols and inscriptions in the image. To make up for slight differences in alignment, we blur the image again, before and after performing an absolute difference of one image from another, and then summing the intensity of pixels that are different.

Here is the card 8 of spades being compared against Ace of Hearts. The right third of the image is the result. As you can see, the subtraction leads to a lot of artifacts:

In contrast, here's what happens when we compare the registered picture of 8 of spades from our perspective image above, against the picture of 8 of spades from the training deck. As you can see, the result is almost fully black, which means the difference is minimal!

Thus recognizing each card becomes a simple process of comparing each incoming card against each card against the deck, and going with the one with the minimum difference. As noted before, there are many more complex and robust approaches (including ones that involve the use of classifiers). However, since this is a fun project and I only had a few hours to write code, we're going to go with the version that takes the least amount of time!

One thing to note is that the card identification process could be reused for any card game -- not just this one.

Putting it Together

Given all the parts of the system, we can now get pictures from the webcam, and identify the cards from the picture. We then compute the solution for the numbers, and then print it out back onto the table by projecting it on the projector, which is connected to the computer.

  training = img.get_training(training_labels_filename,training_image_filename,num_training_cards,avoid_cards) 
  webcam = cv2.VideoCapture(0)
  cv2.namedWindow("preview")
  if webcam.isOpened(): # try to get the first frame      
  else:
      rval = False
  while rval:
      rval, im = webcam.read()
      try:  
        cards = [img.find_closest_card(training,c)[0] for c in img.getCards(im,num_cards)]
        cards = ['1' if c == 'A' else c for c in cards]
        display(g24.solve(cards))

And there you have it, an end-to-end, automated player for the game 24! If you would like to try running the code yourself, the card recognition code, training deck image and test images are available on Github.

|

December 29th, 2012

My Year in Cities, 2012

Continuing the long-running tradition, here’s my list of cities for 2012:

New Job: I’ve meant to write so, so many blog posts about this, but this job is intense enough that there’s been absolutely no time to catalog it. Short version of the story: Landed on a new island, on track to building my little empire.

  • Columbus, OH — Very pleasantly surprised by how nice this city is. The cultural scene is impressive, the area is flat and great for biking and running, and Ohio State is immense and brimming with possibilities.

Conferences, Panels, Random Life Things:

  • Washington, D.C. — I’d never been to DC ever before, but this year I ended up visiting D.C. thrice, for all sorts of reasons.
  • Istanbul, Turkey — VLDB 2012. What a wonderful city. Spending an afternoon ferrying back and forth between Europe and Asia is pretty awesome.
  • Ann Arbor, MI — hard to not drive here a few times. It’s good to be able to visit and participate in panels as “Distinguished Alumni”.
  • Atlanta, GA — visiting a friend at the very impressive Shepherd Center. Sadly, didn’t get to actually spend a lot of time touring the place. Seemed like a warm and welcoming city though. Some other time, maybe.
  • Pittsburgh, PA — Visited Carnegie Mellon. Very pretty campus. The drive from Columbus is very pleasant too.
  • Rekjavik, Iceland — “Surreal” is a good word to use here. So amazing.

8 Cities, 3 continents. Not quite the count for 2011, but then again, a lot more going on. Looking forward to 2013!

| |

May 27th

Introducing the CUBE operator for Apache Pig

Guest post by Prasanth Jayachandran , who has been working on implementing CUBE support for Pig, as part of the large-scale distributed cubing effort.

Update: As per Dmitriy’s tweet:
…the naive implementation is in. The scalable count distinct impl is pending 0.11 branching, will go into 0.12.

The next version of Apache Pig will support the CUBE operator ( patch available here ). The CUBE operator represents grouped aggregations on all possible combinations of the input dimensions, for a given input measure.

This patch adds syntactic sugar to the already existing built-in CubeDimensions UDF. With this new addition, aggregations across multiple dimensions can be easily represented using CUBE operator. The following example illustrates the CUBE operator usage in Apache Pig:

Basic Usage:
inp = LOAD ‘/pig/data/salesdata’ USING PigStorage() AS(product:chararray,location:chararray, year:int, sales:long);
cubed_inp = CUBE inp BY (product,location,year);
out = FOREACH cubed_inp GENERATE FLATTEN AS (product, location,year), COUNT as total, SUM as sales;

Sample output:
For a sample input tuple (ipod, miami, 2012, 200000), the above query generates all combinations of the tuple:
(ipod, miami, 2012, 1, 200000)
(ipod, NULL, NULL, 1, 200000)
(NULL, miami, NULL, 1, 200000)
(NULL, NULL, 2012, 1, 200000)
(ipod, miami, NULL, 1, 200000)
(ipod, NULL, 2012, 1, 200000)
(NULL, miami, 2012, 1, 200000)
(NULL, NULL, NULL, 1, 200000)

Output Schema for CUBE operator:
grunt> describe cubed_inp;
cubed_inp: {group: (dimensions::product: chararray, dimensions::location: chararray, dimensions::year: int), cube: {(dimensions::product: chararray, dimensions::location: chararray, dimensions::year: int,sales: long)}}

Note the second column in cubed_input bag ‘cube’ field which is a bag of all tuples that belong to ‘group’. Also note that the measure attribute ‘sales’ in load statement is pushed down to CUBE statement so that it can be referenced later while computing aggregates on the measure like in this case SUM

Upcoming Enhancements:
The current implementation is equivalent to the naive implementation in MRCube. Following are the core features that I am planning to implement as a part of the Google Summer of Code 2012 program:

  • Optimize naive implementation
  • Support for hierarchical dimension
  • Support for ROLLUP/GROUPING SETS operation similar to SQL/Oracle server
  • Distributed cubing for holistic measures

All these features should be available by end of this summer. Keep an eye on PIG-2167 (and all its sub-tasks) for more updates!

|

May 22nd

Skimmer: Rapid Scrolling of Relational Query Results

This paper was presented at SIGMOD 2012. [pdf] [slides]

On occasion of SIGMOD 2012, I thought I'd write up a short post about a new project, Skimmer.

When considering challenges of ad-hoc, end-user interaction with databases, user actions can be broadly categorized into three groups: (1) explicit, articulate querying of the database, (2) searching through the database, and (3) browsing through the database. Prior work in the area of database usability has recognized (2: Searching) and (3: Browsing) as being significant challenges. My dissertation work attempted to solve (2: Searching) using a combination of autocompletion and qunits. The Skimmer project looks at (3: Browsing). In our SIGMOD 2012 paper, we introduce a method that makes it easier for users to browse large datasets. Here's the abstract and the intuition behind it, followed by a cool demo video of a new implementation of Skimmer:

A relational database often yields a large set of tuples as the result of a query. Users browse this result set to find the information they require. If the result set is large, there may be many pages of data to browse. Since results comprise tuples of alphanumeric values that have few visual markers, it is hard to browse the data quickly, even if it is sorted.

In this paper, we describe the design of a system for browsing relational data by scrolling through it at a high speed. Rather than showing the user a fast-changing blur, the system presents the user with a small number of representative tuples. Representative tuples are selected to provide a “good impression” of the query result. We show that the information loss to the user is limited, even at high scrolling speeds, and that our algorithms can pick good representatives fast enough to provide for real-time, high-speed scrolling over large datasets.

The intuition behind picking representative tuples is simple: Since pages will be consumed in succession, we represent a sampled overview of the current page, but it is a good idea to bias the samples such that we avoid (re)showing information provided in the prior pages. Thus, we promote tuples that increase the information provided to the user compared to the prior pages: i.e. minimize the information loss. The paper evaluates this idea against varying scrolling speed, page size, number of attributes and information quality.

The video above demonstrates the new version of Skimmer we've been working on at Ohio State. Similar to the original Skimmer system, it initially only shows you representative tuples from a page, loading and rendering in the rest of the data in a lazy fashion. Non-rendered tuples are shown as grey bands, for clarity.

This project falls under the "Surfacing of Insights" section of the roadmap laid out in the Guided Interaction VLDB 2011 vision paper, and is only the first step in what I hope is an engaging conversation in the area of large-scale data browsing.

And of course, if you are a current student at Ohio State looking to work on cool projects such as this one, I am looking for students to join my research group. Drop me a line!

|

February 8th

Calling all hackers

I put up this post on Twitter / a few places in the University:

“The data team is looking for hackers. 614-f14cfb5abaa0ee055e1222018cc1ec4f”

A few hints:

* If 614-f14… doesn’t work for you, also try 614-ba18b632839188a00b6f0e3784a14bd9. Both have the same desired outcome. (ba18… is what I’d initially posted and is harder to solve for, imho)
* I later tweeted: “Python solution fits in a tweet; takes 5 min to write, <5 to run.”
* 614 is the Columbus phone area code.

Looking forward to a few people cracking this code!

Update: Here’s the solution in both Python and SQL:

Python:
from hashlib import md5;
[x for x in range(10**7) if md5(”%0.3d-%0.4d” % (x/10000,x%10000)).hexdigest()== ‘f14cfb5abaa0ee055e1222018cc1ec4f’]

SQL (assumes table “n” containing a single column n, with tuples, 0 through 9):
SELECT * , MD5 ( CONCAT ( n1.n, n2.n, n3.n, “-”, n4.n, n5.n, n6.n, n7.n ) ) AS x
FROM n AS n1, n AS n2, n AS n3, n AS n4, n AS n5, n AS n6, n AS n7
HAVING x = “f14cfb5abaa0ee055e1222018cc1ec4f”

|

December 19th, 2011

my year in cities, 2011

To continue the tradition from past years, here’s a list of cities I have been to this year:

End of an era: 5.5 years in the making, the thesis is submitted, bags are packed, memories are tucked away into a special part of my heart. It’s a little crazy to think about, but I’ve called A2 “home” for just about as long as any other city I’ve ever lived in!

  • Ann Arbor, Michigan

Job hunt: The end of a Ph.D. brings with it an epic multi-month fly-everywhere-and-talk-about-yourself ordeal. Scheduling this was quite the epic traveling-salesman setup. (Turns out, we didn’t have an utterly optimal solution — I flew the DTW->PHX->SJC I think 4 times this year!)

  • Bay Area (Palo Alto, Sunnyvale, Santa Clara, San Jose and San Francisco)
  • Columbus, Ohio
  • Redmond, Washington: I come back here a few months later for a friend’s wedding
  • Madison, Wisconsin
  • Florham Park, New Jersey
  • New York City, New York

One last epic summer internship:

  • Mountain View, California

My first conference travel as a non-student:

  • Seattle, Washington

Going home: back to India after many years!:

  • Delhi
  • Mumbai: though Delhi is technically “home”, 2 weeks in Delhi, and I just had to leave for Mumbai. Maybe it’s the waves at Juhu beach. Maybe it’s the bustle of the trains. Or maybe it’s the huge list of seafood restaurants I just had to eat through!
  • Kolkata: Like an old, wizened grandma, it doesn’t matter HOW many years you come back after, Kolkata is always the same.

Southeast-Asia vacation with parents:

  • Singapore
  • Putrajaya & Kuala Lumpur, Malaysia
  • Genting, Malaysia
  • Penang, Malaysia
  • Phuket, Thailand
  • Krabi Island, Thailand

Seven days in Rajasthan : Longest I’ve been away from my laptop!

  • Udaipur
  • Jodhpur
  • Jaisalmer
  • Jaipur

Next up:

  • Columbus, Ohio!

21 cities, 5 countries, 2 continents — looking forward to 2012!

|

August 31st

Guided Interaction: Rethinking the Query-Result Paradigm

Just got done with my presentation at VLDB 2011's Challenges and Visions Track! The paper is available here, as are the talk slides. Abstract and browsable slides are below!

Abstract:Decades of research, coupled with continuous increases in computing power, have enabled highly efficient execution of queries on large databases. For many databases, far more time is spent by users formulating queries than by the system evaluating them. It stands to reason that, looking at the overall query experience we provide users, we should pay attention to how we can assist users in the holistic process of obtaining the information they desire from the database, and not just the constituent activity of efficiently generating results given a complete precise query.

In this talk, we examine the conventional query-result paradigm employed by databases and demonstrate challenges encountered when following this paradigm. We recognize that the process of query specification itself is a major stumbling block. With current computational abilities, we are at a point where we can make use of the data in the database to aid in this process. To this end, we propose a new paradigm, guided interaction, to solve the noted challenges, by using interaction to guide the user through the query formulation, query execution and result examination processes.

There are significant engineering challenges to constructing the system we envision, and the technological building blocks to address these challenges exist today.

|