Skimmer: Rapid Scrolling of Relational Query Results

This paper was presented at SIGMOD 2012. [pdf] [slides]

On occasion of SIGMOD 2012, I thought I'd write up a short post about a new project, Skimmer.

When considering challenges of ad-hoc, end-user interaction with databases, user actions can be broadly categorized into three groups: (1) explicit, articulate querying of the database, (2) searching through the database, and (3) browsing through the database. Prior work in the area of database usability has recognized (2: Searching) and (3: Browsing) as being significant challenges. My dissertation work attempted to solve (2: Searching) using a combination of autocompletion and qunits. The Skimmer project looks at (3: Browsing). In our SIGMOD 2012 paper, we introduce a method that makes it easier for users to browse large datasets. Here's the abstract and the intuition behind it, followed by a cool demo video of a new implementation of Skimmer:

A relational database often yields a large set of tuples as the result of a query. Users browse this result set to find the information they require. If the result set is large, there may be many pages of data to browse. Since results comprise tuples of alphanumeric values that have few visual markers, it is hard to browse the data quickly, even if it is sorted.

In this paper, we describe the design of a system for browsing relational data by scrolling through it at a high speed. Rather than showing the user a fast-changing blur, the system presents the user with a small number of representative tuples. Representative tuples are selected to provide a “good impression” of the query result. We show that the information loss to the user is limited, even at high scrolling speeds, and that our algorithms can pick good representatives fast enough to provide for real-time, high-speed scrolling over large datasets.

The intuition behind picking representative tuples is simple: Since pages will be consumed in succession, we represent a sampled overview of the current page, but it is a good idea to bias the samples such that we avoid (re)showing information provided in the prior pages. Thus, we promote tuples that increase the information provided to the user compared to the prior pages: i.e. minimize the information loss. The paper evaluates this idea against varying scrolling speed, page size, number of attributes and information quality.

The video above demonstrates the new version of Skimmer we've been working on at Ohio State. Similar to the original Skimmer system, it initially only shows you representative tuples from a page, loading and rendering in the rest of the data in a lazy fashion. Non-rendered tuples are shown as grey bands, for clarity.

This project falls under the "Surfacing of Insights" section of the roadmap laid out in the Guided Interaction VLDB 2011 vision paper, and is only the first step in what I hope is an engaging conversation in the area of large-scale data browsing.

And of course, if you are a current student at Ohio State looking to work on cool projects such as this one, I am looking for students to join my research group. Drop me a line!

written on 23 May 2012 - 3:09am|

blog

About the author:

Arnab Nandi is an Assistant Professor in the Department of Computer Science and Engineering at The Ohio State University. You can read more about him here.

Before and After

Calling all hackers

Introducing the CUBE operator for Apache Pig

August 2002 : 9 posts September 2002 : 16 posts October 2002 : 7 posts November 2002 : 21 posts December 2002 : 25 posts January 2003 : 8 posts February 2003 : 11 posts March 2003 : 7 posts April 2003 : 21 posts May 2003 : 14 posts June 2003 : 15 posts July 2003 : 4 posts August 2003 : 16 posts September 2003 : 25 posts October 2003 : 15 posts November 2003 : 24 posts December 2003 : 17 posts January 2004 : 6 posts February 2004 : 8 posts March 2004 : 6 posts April 2004 : 5 posts May 2004 : 29 posts June 2004 : 3 posts July 2004 : 17 posts August 2004 : 19 posts September 2004 : 3 posts October 2004 : 4 posts December 2004 : 1 posts February 2005 : 14 posts March 2005 : 17 posts April 2005 : 8 posts May 2005 : 27 posts June 2005 : 73 posts July 2005 : 44 posts August 2005 : 13 posts September 2005 : 3 posts October 2005 : 9 posts November 2005 : 20 posts December 2005 : 6 posts January 2006 : 25 posts February 2006 : 23 posts March 2006 : 36 posts April 2006 : 35 posts May 2006 : 7 posts June 2006 : 22 posts July 2006 : 20 posts August 2006 : 27 posts September 2006 : 15 posts October 2006 : 6 posts November 2006 : 19 posts December 2006 : 3 posts January 2007 : 4 posts February 2007 : 1 posts March 2007 : 3 posts May 2007 : 5 posts June 2007 : 2 posts July 2007 : 1 posts August 2007 : 13 posts September 2007 : 1 posts October 2007 : 21 posts November 2007 : 7 posts December 2007 : 9 posts January 2008 : 4 posts February 2008 : 13 posts March 2008 : 14 posts April 2008 : 11 posts May 2008 : 12 posts June 2008 : 12 posts July 2008 : 5 posts August 2008 : 10 posts September 2008 : 11 posts October 2008 : 10 posts November 2008 : 8 posts December 2008 : 4 posts January 2009 : 6 posts February 2009 : 13 posts March 2009 : 7 posts April 2009 : 7 posts May 2009 : 2 posts June 2009 : 3 posts July 2009 : 4 posts August 2009 : 4 posts September 2009 : 6 posts October 2009 : 4 posts November 2009 : 7 posts December 2009 : 10 posts January 2010 : 3 posts February 2010 : 2 posts April 2010 : 5 posts May 2010 : 1 posts July 2010 : 4 posts August 2010 : 3 posts September 2010 : 4 posts October 2010 : 1 posts November 2010 : 2 posts December 2010 : 3 posts June 2011 : 1 posts August 2011 : 1 posts November 2011 : 1 posts December 2011 : 1 posts February 2012 : 1 posts May 2012 : 2 posts December 2012 : 1 posts June 2013 : 1 posts August 2013 : 1 posts October 2013 : 2 posts September 2014 : 1 posts November 2014 : 1 posts November 2015 : 2 posts January 2016 : 1 posts January 2017 : 1 posts April 2017 : 2 posts