We recently returned from VLDB 2014 in Hangzhou, China where we presented our recent contributions to the GestureDB and DICE projects. VLDB is one of the top conferences in database research and we were very excited to meet so many DB researchers, discuss different projects, and listen to interesting talks. The organizers had also planned an evening lights show in WestLake that was incredible.
Image courtesy Mitch Altman
Lilong presented his full paper on Gestural Query Specification. In this paper, we ask, “Can gestures be used to articulate queries on a database?” We describe various components of our multitouch iPad-based prototype (that we presented as a demo in VDLB 2013) — the classifier, gestural language and architecture for query specification. We discuss our usability studies and show that gestural querying is indeed easy to use and more intuitive when compared to traditional database query interfaces.
Niranjan presented a demo, Combining User Interaction, Speculative Query Execution and Sampling in the DICE System. This was work on our interactive analytics system, an overview of which was was published in ICDE 2014. At VLDB we demonstrated the user interface aspects of DICE, and how the design of the entire architecture was inspired by the interaction of the user and the system.
Keynotes and Panels
In addition to the in-memory database keynote by Hasso Plattner (co-founder of SAP) and Volker Markl’s keynote about declarative querying, there was an interesting keynote by our grand-advisor Prof. H.V. Jagadish at the Data4U workshop. Prof. Jagadish provided numerous insights in the database usability area. The key idea was the fact that query processing time is mainly divided into query authoring time and query execution time. For decades, researchers have spent countless hours working on query execution side of things. It looks like a good idea to spend more efforts now on the query authoring aspect. In addition to Data4U, there were two sessions for database usability, reinforcing the above observation.
Another interesting joint keynote was by Dr. Shivkumar Venkatraman (VP Engineering Ads and Commerce at Google) who presented the changes the ads backend at Google went through as it progressed from a company with a revenue of a few hundred millions to the behemoth of today and by Prof. Divy Agrawal, who presented his views on treating datacenters as computers.
There was an interesting panel moderated by Dr. C. Mohan of IBM on the Role of Databases in the era of Big Data. Are programming languages and distributed systems luminaries winning the battle against the DB community over the future of data? The panel included Michael Carey, Surajit Chaudhari, Ashish Gupta, Wolfgang Lehner, Chris Re, Gera Shegalov. It was a fascinating panel punctuated at the end by inputs from Divy.
Some talks that we liked
There were several great papers being presented at VLDB and we tried to attend as many of the presentations as possible. Here is a small subset of some of the papers we enjoyed:
- Toward Computational Fact-Checking — The paper demonstrates how parameterized queries can be used for fact checking. An example from the paper:
A TV ad in the 2010 elections claimed that Jim Marshall, a Democratic incumbent from Georgia “voted the same as Republican leaders 65 percent of the time. This comparison was made with Republican Leader John Boehner over the votes in 2010. If we look at the history since 2007, however, the number would have been only 56 percent, which is not very high considering the fact that even the Democratic Whip, Jim Clyburn, voted 44 percent of the time with Boehner during that period. Basically, many votes in Congress are not as controversial as the public would think!
The authors describe their fact-checking framework that works by “perturbing” the parameterized query formulations of the claims in interesting ways.
- Exemplar Queries: Give me an Example of What You Need: In this paper, when a user inputs a query in search engine, it considers the query an example instead of a query and tries to return similar examples to the user. The paper describes an implementation of this for graph-modeled data and provide a nice evaluation using Freebase.
- The Case for Data Visualization Management Systems This is a very interesting vision paper that was in the same session as our “Gestural Query Specification” paper. in this paper, the author tries to combine the visualization with DBMS together and utilize the existing DBMS to serve visualization, with examples of techniques from imMens, Tableau, nanocubes, bigvis, M4 and DICE, amongst others.
- A Sampling Algebra for Aggregate Estimation: This paper extends Generalized Uniform Sampling to allow for sampling operators to commute with selections and joins, which is a very powerful contribution. Since it is not currently possible to obtain a random sample of a join by sampling both the relations, the paper discussed techniques to obtain the sample variance even without having a random sample. We are excited about doing similar work in DICE to further optimize our query plans.
- ClusterJoin: A Similarity Joins Framework using MapReduce: The paper addresses this challenge by partitioning the space using random samples as partition centers to reduce the probable pairs. This method first rules out and prunes impossible candidate pairs without performing actual comparisons by using bisector-based reasoning, and proposes a load balancing scheme to avoid “the curse of the last reducer”. The paper has some interesting evaluations over spatial and document data.
The conference was a lot of fun and a great learning experience. We are now working on our next contributions to DICE and GestureDB, and hope to present them at upcoming venues.