Sunday, March 20, 2011

PyCon 2011 - Data, Men, and Me

In the past couple years, I've switched from sending myself to research conferences (like CHI) to more down-and-dirty developery conferences. I'm looking for skills development and tools I can use day-to-day. This spring I went to PyCon in Atlanta, since I've been using Python more and more for data analysis problems. (Complete talk videos are here on

The initial draw was the tutorials. I aimed for cloud data and machine learning. Olivier Grisel's tutorial Applied Machine Learning in Python with scikit-learn was a definite high point of the conference for me. His talk on text analysis was very good as well -slides here, and video here. His French accent was very nice, but I kept mishearing "scikit-learn" as "psychic learn." :-) I also really enjoyed the talk on Genetic Algorithms by Eric Floehr, a fellow who seems to do weather prediction consulting. His slides and a bunch of other interesting supporting material (including code) are up on his site.

There were a lot of talks on data, big data, cloud data, and scaling Python (to handle big data and data problems). Other examples: A talk on Pypes by Eric Gaumer included a good reminder that big data problems existed in the search engine space long before other kinds of big data became "hot" to work on. Pypes is a quasi-visual toolkit for doing data processing inspired by Yahoo pipes. (The gist being that since a lot of data handling involves discrete steps to clean and transform, you can put these steps into little modules that allow you to view the big picture of what's going on with your data munging.)

Hilary Mason's excellent keynote made a lot of us data geeks happy; she called for programming language evolution to get closer to the data problems, and to be less cryptic when it comes to support for multi-threading and map-reduce strategies needed these days. (I loved her "WTF?" comment on her multithreading code example.) Yelp's "mrjob" library for the cloud might answer some of her issues, but I missed that talk for some reason!

Another talk on big data that was well-tweeted was C. Titus Brown's "Handling Ridiculous Amounts of Data with Probabilistic Data Structures." Slides here - probably requires the video to fully interpret this, at least it does for me (yes, I missed this one too).

Not all talks were excellent, of course. My linguistics degrees got grouchy during one on the linguistics of twitter -- or maybe it was my geeky side asking "what can I do with this?" Some talks were nice surprises, too, kind of the point of going to conferences! Based on lunch table happenstance, I ended up going to a Blender API talk by Chris Allan Webber, a subject about which I knew zilch. Blender is apparently beefing up its API for external calls and automation; as a visualization person, I'm interested in tools that I can "drive" with data as input. I have big hopes for the evolution of and Nodebox2, two pythonic visualization options, but I am not sure they're there yet for me as a data vis person.

My sad female nerd note: I was one of 3 women in the Machine Learning tutorial. Out of perhaps 40? I later heard via Twitter a guess that there were only 8% women at the conference as a whole, based on t-shirt orders. I loved Hilary's talk, but was a bit bummed out by the Dropbox keynote that featured the social network of "friends of Arash" who started that company -- yeah, all men.

A final comment for any UX folks reading this: This would've been a great audience for a talk on UI design in open source, or UI design for Python UI's. There were a lot of companies presenting: Dropbox did their "we use Python" talk; Evite apparently has rewritten their entire java backend in Python; Threadless, a sponsor, is all Python... One of the reasons for its growth at these companies is the ease of writing things fast in Python; the "prototype and iterate" philosophy showed up over and over in various presentations as a real strength of Python. As a light coder myself, I can't agree more. I was there as a data-oriented geek, but I saw UX opportunity everywhere, for the right kinds of UX folks.