BaconSnake to core: Pig 0.8 released!

My contribution to Python UDFs in Pig has finally released as part of the new and shiny 0.8 release! I’ve been meaning to blog about how this came about when I had time, but Dmitriy saved me the work, so I’ll just quote him instead:

This is the outcome of PIG-928; it was quite a pleasure to watch this develop over time — while most Pig tickets wind up getting worked on by at most one or two people, this turned into a collaboration of quite a few developers, many of them new to the project — Kishore Gopalakrishna’s patch was the initial conversation starter, which was then hacked on or merged into similar work by Woody Anderson, Arnab Nandi, Julien Le Dem, Ashutosh Chauhan and Aniket Mokashi (Aniket deserves an extra shout-out for patiently working to incorporate everyone’s feedback and pushing the patch through the last mile).

Yay Open Source! Documentation is available on the Pig website, but here’s a one-line example of what you can now do:

register 'test.py' using jython as myfuncs;

Notice that we’re using a python script instead of a jar. All python functions in test.py become available as UDFs to Pig!

Going forward, there are two things I didn’t get to do, which I’d love any of my dear readers to pick up on. First, we’re still not at par with SCOPE’s ease-of-use since UDFs can’t be inlined yet (a la BaconSnake). This is currently open as a separate Apache JIRA issue, would be great to see patches here!

The second feature that I really wanted to make part of this patch submission, and didn’t have time for, was Java source UDF support as a JaninoScriptEngine. Currently, Pig users have to bundle their UDF code as jars, which quickly becomes painful. The goal of the ScriptEngine architecture is to let multiple languages be implemented and plugged in, so I really hope someone takes the time to look at the Janino project, which will allow Java source UDFs with zero performance penalty. This would also make it the first dynamically compiled query system over Hadoop (as far as I know), and opens up a world of query optimization work to explore.

About the author:

Arnab Nandi is an Assistant Professor in the Department of Computer Science and Engineering at The Ohio State University. You can read more about him here.


August 2002 : 9 posts September 2002 : 16 posts October 2002 : 7 posts November 2002 : 21 posts December 2002 : 25 posts January 2003 : 8 posts February 2003 : 11 posts March 2003 : 7 posts April 2003 : 21 posts May 2003 : 14 posts June 2003 : 15 posts July 2003 : 4 posts August 2003 : 16 posts September 2003 : 25 posts October 2003 : 15 posts November 2003 : 24 posts December 2003 : 17 posts January 2004 : 6 posts February 2004 : 8 posts March 2004 : 6 posts April 2004 : 5 posts May 2004 : 29 posts June 2004 : 3 posts July 2004 : 17 posts August 2004 : 19 posts September 2004 : 3 posts October 2004 : 4 posts December 2004 : 1 posts February 2005 : 14 posts March 2005 : 17 posts April 2005 : 8 posts May 2005 : 27 posts June 2005 : 73 posts July 2005 : 44 posts August 2005 : 13 posts September 2005 : 3 posts October 2005 : 9 posts November 2005 : 20 posts December 2005 : 6 posts January 2006 : 25 posts February 2006 : 23 posts March 2006 : 36 posts April 2006 : 35 posts May 2006 : 7 posts June 2006 : 22 posts July 2006 : 20 posts August 2006 : 27 posts September 2006 : 15 posts October 2006 : 6 posts November 2006 : 19 posts December 2006 : 3 posts January 2007 : 4 posts February 2007 : 1 posts March 2007 : 3 posts May 2007 : 5 posts June 2007 : 2 posts July 2007 : 1 posts August 2007 : 13 posts September 2007 : 1 posts October 2007 : 21 posts November 2007 : 7 posts December 2007 : 9 posts January 2008 : 4 posts February 2008 : 13 posts March 2008 : 14 posts April 2008 : 11 posts May 2008 : 12 posts June 2008 : 12 posts July 2008 : 5 posts August 2008 : 10 posts September 2008 : 11 posts October 2008 : 10 posts November 2008 : 8 posts December 2008 : 4 posts January 2009 : 6 posts February 2009 : 13 posts March 2009 : 7 posts April 2009 : 7 posts May 2009 : 2 posts June 2009 : 3 posts July 2009 : 4 posts August 2009 : 4 posts September 2009 : 6 posts October 2009 : 4 posts November 2009 : 7 posts December 2009 : 10 posts January 2010 : 3 posts February 2010 : 2 posts April 2010 : 5 posts May 2010 : 1 posts July 2010 : 4 posts August 2010 : 3 posts September 2010 : 4 posts October 2010 : 1 posts November 2010 : 2 posts December 2010 : 3 posts June 2011 : 1 posts August 2011 : 1 posts November 2011 : 1 posts December 2011 : 1 posts February 2012 : 1 posts May 2012 : 2 posts December 2012 : 1 posts June 2013 : 1 posts August 2013 : 1 posts October 2013 : 2 posts September 2014 : 1 posts November 2014 : 1 posts November 2015 : 2 posts January 2016 : 1 posts January 2017 : 1 posts April 2017 : 2 posts