Wednesday, March 27, 2013

Data Visualization with Nodebox

For PyData 2013, I put together a talk on using Nodebox OpenGL for data visualization. My goal was to expose the data science audience to a flexible tool similar to Processing, but that allows one to write in Python and use Python data libraries. (The java-esqueness of Processing has always put me off reaching for it when I'm working, despite a general fondness for it. I still own every single (English) book on Processing, AFAIK.)


ETA: Web video of my talk is here on the Pydata vimeo site.

My talk was generally well-received, although I think I flummoxed the stats graphics people a little bit who probably weren't expecting something so "sketchy" from me. Hey, I love those other tools too, and use Matplotlib (and d3 too!) regularly.

A few quick comments on the Nodebox eco-system: The current focus of the team in Leuven is on Nodebox 3, a block-diagram visual programming tool, not the 2 variants I talked about (Nodebox 1 and Nodebox OpenGL). I think NB3 veers away from usefulness for the data science crowd that might benefit from a Python alternative to Processing. If the enormous success of the java-based Processing is anything to go by, I'm not crazy in thinking a Python tool like it should be huge! After all, it's cuddly Python! So at the end of my talk, someone actually asked me why he should have sat there for 45 minutes if I was not talking about thriving open source code with a huge community behind it. My response was, more or less, "It's already super useful which I hope I showed, and more people could be working on it than just the original authors." That's how open source works, right? (By the way: That guy apologized to me later, but I didn't take it badly when he said it.)

A couple more comments on my slides: My own data experiments in the deck weren't incredibly successful, largely due to issues with the database I used. I wanted to explore Shane Bergsma's gender-of-nouns database collected off Google news, and what I found was that it thinks everything is really "male." Cuz most news articles are about men, probably. (Also, it proved less useful on older Gutenberg books, because old-fashioned vernacular nouns don't appear in the db, like "momma." So out went Pride and Prejudice and out came my credit card for Kindle books.) Hence, all my fiction gender plots look kind of like these, with heavy weights towards male and neutral nouns:



The pdf of my slides is here and the code zip file is here. Do check my appendices: I figured out a bunch of issues related to paths in Nodebox 1, running NB 1 from the command line, and the like.

A couple nice post-conference mentions: Jake Vanderplas's take on Matplotlib history and visualization in Python, which has some interesting comments. I spent a while talking to Ben Lorica (@bigdata) at PyData, and he nicely mentioned Nodebox in his well-RT'ed article on how Python Data Tools Just Keep Getting Better.

Also, before the conference, I was interviewed for a podcast about data vis skills. I didn't advertise this very broadly because of a few mistakes in the initial post (one in particular that claimed I hated d3, which is certainly not true at all -- I said it had a learning curve, you can listen yourself!).