Saturday, November 16, 2013

Data Vis Consulting: Advice for Newbies

Every time I give a talk or introduce myself at a conference, someone gets really interested in what I do. When I think I’ve scored a potential client, it turns out they just want to know if they could do what I do. Some folks are direct: One woman at a Python Data Science conference said, “So, you know that job ad list you run for data-vis jobs? How do I get one of those jobs?”

Backing up a tick, in case you didn’t know, I curate a low-traffic list for job ads in data-vis (short for “data visualization”). I think she knew about it from me on Twitter. If you don’t follow me on Twitter, you might think about it, I share a lot of links related to data science and visualization: @arnicas.

This post is about getting yourself to a place to get those jobs, plus money and client issues. I’m not the poster child for successful consulting, but I have been doing it a few years now and I am living off it. I work as an individual contributor, writing code for data analysis and creating interactive visualizations. To balance out my own skew on things, I asked for input from some fairly famous names, mostly folks I know via Twitter (see People Resources).

Note: I’m not talking in this post about skills needed or training resources--there are books, MOOCs, and plenty of ways to find that stuff by now; see especially Andy Kirk’s site Visualising Data.

The Kinds of Data Vis Work Out There

In my experience, there are a few key varieties of work deliverable types for freelancers in data visualization:

  • "Cool data set" visualization: Client wants someone to explore a data set and produce a static or interactive graphic they can feature as a PR move, as part of a news story or article, or for an internal business report. This work is probably most well-known, since it’s the core of the most famous/artistic vis work.
  • Dashboards: A lot of organizations want analytics (or consumer) dashboards reporting multiple key metrics in an attractive and useful display. They seek help determining those metrics and the best way to present them. (I’m putting these between the “cool data set” and “tools” because as a problem, they combine aspects of both. Make sure you read the Red Flag section for this work!)
  • Tool building: Client is building tools for users (internal or external) to view data of some kind, which means they and you are not starting from a specific data set, but an idealized version of one; and you are helping design/create infrastructure for data exploration.
  • Teaching: Teach principles of design, visualization, how to produce graphics and interactives, basic or advanced stats; usually in workshops.

Work roles range from part-time as-they-need-it work, sub-contracting for big names or projects, contract-to-hire gigs, single project work, retainer work for regular needs; and these jobs often entail some mix of design and development. I myself maintain a mix of these roles and work types so that I keep busy. (However, I charge slightly differently depending on whether I’m doing primarily design or development. I consider UX design to be “harder” and more painful due to the people-politics involved.)

Usually print work doesn’t pay well, but can be excellent for PR and the portfolio. Most of the “famous” artistic vis folks do some percentage of print work and win awards for it, even the ones who also do interactive work.

In general, a lot of us (in my People Resources) started doing freelancing in other areas, before getting more focused on the data visualization field. I was a UX designer for many years at (too) many companies before I went independent. Through some lucky breaks, I was able to do more and more data-related projects in my UX work, until I switched entirely to data work a few years ago. Others from whom I solicited input said they started in general web design or development contract work before moving into 100% data work.

Client (Mis)Understandings About What We Do

I’m hearing more often from start-ups and funded research groups (in universities and companies) that they have plenty of back-end data analytics people, but are in need of front-end folks for data in particular. Front-end people have always been in short supply, and ones who can do good work in the latest visualization tools are even scarcer. But beware, “front-end” for a data shop or an un-savvy product manager may mean any or all of: UI designer, website builder (all of it, from scratch, including login modules, preferences, menus, etc), interactive or static data visualization builder (e.g., d3, or just ordinary charts/graphs). Also, sometimes database janitor and stats person, depending on who else is on staff.

I try to explain that I’m no longer a broad UX designer for site architectures/workflows, that I’m working in the data area only now. Here’s a chart from Alberto Cairo’s book The Functional Art that broadly captures the data design specialization, although it doesn’t try to capture skills, overlaps, and tools at all:

Given the excitement over big data's promises and analytics-driven business goals, the popularity of infographics, and the excellence of today's interactive data journalism (at places like The New York Times and The Guardian), data visualization is hot for consulting right now. Unfortunately, there can be a lot of noise amidst the signal from legitimate clients. Sometimes I can't tell if I'm reading a spam broadcast or a real email, to whit, today: "Mr. [Redacted] is requesting contact information for anyone/company with experience in store cluster analysis, at a reasonable price."

Kim Rees says to beware of the client saying, “We’re really excited to do some datavis! But we’re not sure how to get started.” “These people have no idea what datavis is. Conversations will be confusing, nebulous, and full of far more questions than answers. Tell them to get back to you when they have a project in mind that involves data. Or give them a budget just to explore ideas with them. The only deliverable of that phase will be: Project Idea write-up no longer than one page.” Likewise, coming up with a visualization appropriate for the data and users IS the job, and doing pre-contract work to determine what you will build is not workable for a sustainable consulting business.

"These people have no idea what datavis is."
(Kim Rees)

Tiffany Farrant-Gonzalez notes that “lots of clients are attracted to heavily visual infographics that have become popular, and it’s sometimes hard work to educate them about good visualization practices.” She says, they sometimes want you to “simply make their data ‘look cool’ or ‘more interesting’ without really understanding what this means or the process involved.” Certainly even in development jobs, I have to explain that there is a data exploration, analysis, and design phase BEFORE the building starts -- just as in other design spaces.

Moritz, who most frequently works on what I'd class as "cool data set vis" projects, tells me he usually requests a data sample and some answers to a few questions clarifying the context and basic motivation of the project before starting:

  • Why are we doing this?
  • What are you hoping to achieve?
  • Who are we targeting?
  • How is the end product going to be used?
  • How are we publishing?
  • What data do we have available?
  • Which other existing materials should we take into account?
  • Which constraints do we have?
  • Who is responsible for what?
  • Who else is doing something similar?

For Moritz, answers to these questions help him understand why the client thinks a data visualization is important, and also help define success criteria for the project. He says, “Often, both the client and I realize that half of these questions cannot be answered yet, but that's fine, as long as we make sure to answer them along the way.”

Moritz shares a workflow diagram he uses with clients to illustrate the process and iterative stages of the work:

Moritz Stefaner noted in his excellent interview on FILWD that you need to educate a client to move along with you, so they see the value and thought process, the pros and cons of various design approaches. All design involves tradeoffs, and you need to illuminate these to help the client evolve their own thinking about what’s important to show. Designers of work other than data vis need to do this as well, of course. Remember this impacts your billable hours: producing presentations or documentation materials around your work is time-consuming.

For work that is closer to tool-building, I would also suggest these kinds of questions, at least for solo
consultants like me:

"Do you actually get any say over the presentation and design?"
(Me, wondering)
  • Do they want you to build it all? Try to get some notion of what "all" means for them.
  • Do they have others who can do the generic site code around the visualization piece? Building a whole site to host a vis project is usually non-trivial work!
  • Do they have a designer on staff now who does CSS/visual design (useful if you aren’t superb at this; or problematic if you are and they’re not, or they aren't clueful about the visuals in visualization)
  • Do you get to touch any of the data yourself (because you need to understand it to build something smart)? Who has the data, how can you get to it? (SQL, API call, samples available as CSV...)
  • Do you actually get any say over the presentation and design, or are you a code monkey in their eyes?
  • Are they looking for work "like" anything in particular? (They usually have an inspiration, whether something in the paper or a tool they use or a competitor.)
  • Do they need anything as fancy as d3.js or advanced charts, or would Highcharts and/or a general javascript person be good enough for them?

Early on in your consulting days, you may need to take more jobs that involve some unpleasantness or non-specialist work, but as you get more successful, you can be more choosey.

Getting Started: Do the Work

Suppose you aren't even at the point of talking to clients about doing data vis work, and you're wondering how to transition into it.

As Bill Shander says, you have to “Do the work.” Scott Murray suggests, “Find data stories that are interesting to you, and create them.” (If you have trouble finding data stories you’re interested in, rethink this as a career path, perhaps?) Get a fun data set, analyze it, do some visualization, post about it on a blog. Create the kinds of things you’d like to get paid for. There’s remarkable correlation of opinion across the folks I asked for input: Do projects, even (or especially) for free, that set an example of what you can do and would like to do. Then publicize them on Twitter, on your blog, in a presentation or talk (such as at a local tech Meetup or conference).

“Make, make, make, make.” (Jer Thorp)

Entering visualization contests is another good way to get some experience and attention, although the bar can be quite high for winning. Visualising.Org has some nice challenges with significant prize money. Bill Shander’s entry in a recent contest was picked up in reporting coverage of the contest and got him client leads. Jan Willem Tulp also praises contests, and offers: “A nice side-effect is that you're actually practicing creating data visualizations for a fictional client. Additionally, you’re already building on your portfolio this way!”

Jer Thorp says, “Make, make, make, make. Reduce the preciousness of your work so that you can make more of it and get further faster.” Amen.

The Required Portfolio

Having a portfolio is critical. You always need something you can show, that you yourself made, because every single client will look for evidence that you can do the work. Note that, unfortunately, a lot of jobs don’t produce work that can be shown in public (whether for NDA reasons, or because it was for an internal tool or demo). I myself have a hidden portfolio that I produce on request, because I’m still tuning my self-presentation and collecting items that can be made public.

Anna Powell-Smith says, “The most effective thing people can do to get hired is to create good projects by themselves. Clients love to see that you can both come up with a good idea, and execute it. And if it's all your own code, they know exactly how good you are.” Put aside long weekends or the Christmas break!

"Be selective in constructing your portfolio." (Everyone)

Jeff Clark says his website projects have gotten him valuable input and forced him to think about a personal work brand. “I think almost every project I've done for pay has started with someone seeing some work I've done on my website and they have contacted me through email.”

Dominik Baur suggests doing visually interesting projects that appeal at first sight, to get you a second, closer glance. The power of the visual can help, and might get you PR from other folks (on Pinterest, for example).

Both Moritz and Jan Willem advise being very selective in your portfolio choices. Most vis folks don’t put all their work in their portfolio, and tune it regularly. Jan Willem says, “Make sure that you show the work that you want to do more of. Don't show everything, don't show that you are also good at many other things, unless you want to get work in that direction as well. It might therefore be better to show only 3 really good projects in your portfolio that represent the kind of projects you would like to do rather than showing everything you've done so far.”

Tiffany echoed this, “Take out work that isn’t work you want to be producing, or suggesting you can produce (especially if it was joint work with others). … It's tempting to list out all of your skills (no matter how strong you are at them) and display all of your previous work on your site or resume, but it really helps if you narrow down to the core services that you want to provide, and hopefully this will help you get the work best suited to you.”

General Self-Promotion: Twitter, Tutorials, Teaching

As Jan Willem Tulp says, “People have to know you exist, that you do data visualization, and that you’re good.” It’s critical to have a web site with your portfolio, a presence on LinkedIn, and possibly a blog too. Being active on Twitter can help too. Jeff Clark suggests curating good work “in public” such as on Twitter, Pinterest, or a blog, and mixing in your own work occasionally.

I do get work via Twitter connections, but I also put a lot of work into Twitter. I get a lot of professional value out of it. Twitter is where I find out what other people think is good work (I save links to delicious and Pinterest), listen to arguments/discussions among experts, hear about good blog posts, find out about conferences where I can meet people who are doing good work and learn new things.

Kim Rees also values Twitter for network connections. She says one way to get her notice is to follow her on Twitter (she reads all follower bios which admittedly not everyone does, ahem), say interesting things about data, visualization or design, and post an insightful comment on one of the Periscopic blog posts. Also, she loves to get paper mail presents. (Hint!)

"Prepping talks takes time, usually unpaid." (Me, after a lot)

Give talks in which you show your work and tell people you do consulting. However, take care: prepping talks takes time, which is usually unpaid work. Make sure you talk in places that can benefit you, and try to keep track of “leads” after each one, to better assess which audiences are good for your business. Don’t forget that giving talks at conferences is also about the networking, though; the benefit of a drink at the bar with someone is often as high as the value of the talk you give (and costs less). Post your slides later, with full contact details (and your website link) in them!

Writing tutorials and teaching can be a good way to get business; Jim Vallandingham got several contract jobs from online tutorials he did, including on the popular site. They can also help you make your own knowledge more concrete: Teaching is a great way to learn. A couple of popular D3 sites and self-published books have been started by people learning as they produce materials that they take payment or donation for (see, e.g., this interesting post by D3 Noob about sales and PR effect of his book). Teaching workshops can also lead to consulting follow-ups, as Andy Kirk notes.

When you do gigs in person, always carry a lot of cards. My business cards say what I do, not just my business name and email. I like to think having a fuller business card will help people remember later why they’ve got it.

Red Flags, or Gigs to Think Twice About

Gigs with no data should be avoided. Yet, they are surprisingly common. Sometimes the client has none yet, or they can't get it to you for various reasons that are themselves red flags. Why is this bad? Because you can’t produce a good design without data investigation first, and it’s a mistake to start without. One of my clients had me drawing fake dashboards in Illustrator for a couple months before we mutually parted ways. It stopped feeling "creative" to be "making it up" pretty damn fast.

Another client had problems getting me both the real data and any design input. When the design input finally came it consisted of a mockup that had been created without any data investigatory work at all. When I looked at the real data, I discovered some large percent of one data field was full of garbage, and of course the design had to change when we realized it was unusable. This data investigation is a crucial step in the design process that can’t be short-cutted. Ideally you are involved in both the data investigation and design stages.

Tool development is often hard because you may be responsible for finding or developing your own test data sets, which takes solid time. I ultimately had to let one project go for a start-up that was taking more time than estimated; it was the perfect storm of debugging and improving someone's very difficult algorithm, plugging into a complicated dev environment (a weekend lost to Git merge despair), and data set collection/creation/testing. I still have nightmares of #Fail from this one.

"I still have nightmares of #Fail from this one." (Me, with regrets)

Other gigs to be wary of: Debugging other people’s code. Just don’t take them. You’ll spend a huge amount of time that isn’t visible as “producing” something, and it will be frustrating to you and the client.

Anna Powell-Smith says talk carefully with clients who want “something amazing,” but aren't more specific. “It's so dependent on how interesting your data is.” She points people to ">this awesome Quora thread about data analysis on the OK Cupid blog:

“OkCupid's blog worked because we had sexy data. [And] we had Christian Rudder writing the blog. … His posts were great because he's such an amazing writer, not because he's awesome at math. (He's certainly the best writer I know.) The posts each took 4-8 weeks of full-time work for him to write. Plus another 2-4 weeks of dedicated programming time from someone else on the team. It's easy to look at an OkTrends post, with all its simple graphs and casual writing style and think someone just threw it together, but it probably had 50 serious revisions. And we threw out a lot of research that didn't turn into good posts. Your start-up probably can't afford to do this. It shouldn't waste like 10 man weeks of effort/focus/money on writing a blog post.”

--Chris Coyne on Quora: “How Important Was Blogging to OKCupid’s Success”

Which brings me to another red flag client type: The start-up that wants to hire someone to do some “viral” datavis posts for their blog, but haven’t read Chris Coyne’s post or know how much work goes into a good data dive and report. I did a “sample post” for one once with no brief on content, spent about 3 times as long as they were really paying for; and they still didn’t think it was punchy enough. (In my defense, I was given a data set of developer questions, not dating preferences.)

Some clients want eye-candy in the “cool” category, either data art or lots of bubbles with animations. I'm not saying don't do these jobs, just be sure you and they know what they want! One client wanted both a salesy "eye-candy" cool piece and a serious dashboard tool for concrete internal business problems; in practice this turned into two projects, unsurprisingly. (No, I didn't work on both.)

Kim Rees suggests that start-ups attracted to data visualization should be avoided, unless they want make you a co-founding partner. Visualization isn’t usually an add-on, it’s a core fundamental. These can be similar to the “make something cool” clients, and their pivots and lack of money usually make for dangerous work relations.

"Dashboard design can be a pit of political misery." (Me)

A special red flag callout for dashboard design jobs: Like a lot of fundamental UX work, dashboard design can be a pit of political misery. Your role often ends up as an analytics counselor for a company that usually hasn’t settled on simple key metrics, which need to be determined before you can produce an attractive and useful design. You iterate quickly on ugly mockup after ugly mockup, trying to help them get internal agreement on their business goals and how to measure them via design artifacts. Highly stressful for you and the client stakeholders! (I tend to charge more for these political wrangling jobs, based on sad experience.) I should clarify that I do like dashboard work, but I now structure the project timing and money to take the politics and analytics discussions into account.

Wes Grubbs says bluntly, “Don't get stuck doing shit you don't enjoy.” Moritz advises avoiding boring or painful jobs for low pay. It might sound obvious, but we’ve all had them. Figure in a “pain coefficient” in calculating your rate for a job (range, upper limit, etc). Calculate your rate by estimating time and value to you -- learning something new, liking the client or the data subject, possibility of portfolio material at the end of it.

“Don't get stuck doing shit you don't enjoy.” (Wes Grubbs)

Wes Grubbs suggests your contract terms allows you to drop a job if you get uncomfortable with the data or client requests.

Real Billable Time

Moritz Stefaner tracks time carefully and reports that one year he found himself having done only 18 hours billable work a week, averaged out. As he says, consulting requires a lot of administration (business calls/email, book-keeping), PR work, learning, and keeping up with tools and the industry.

Please be aware: Freelance sites like Odesk show a world of work being done by kids without mortgages and health insurance, or folks in countries with lower costs of living. The rates offered and accepted on those sites aren’t realistic views of what this work normally pays and what a consultant requires to stay in business long-term.

I prefer to work hourly because project pricing almost always under-represents the true time to successful completion, especially when development is involved. When I track time on project pieces, I find that often up to 50% of dev projects involves just plugging into someone else’s dev process and environment, and it’s not unheard of for 30-40% of the time to be taken up with design alternatives and analysis stages. This doesn’t leave a lot of time for the core development work!

Red flag for project coordination: As Jim Vallandingham notes from one painful project, if there’s a separate person doing the data and a separate person doing the design and you’re doing the coding, it’s hard to sync up. This is time-consuming and potentially impacts quality and pain-coefficient for the work. Also, when you work remotely and do a lot of handing-off, you lose the opportunity to hear client discussions and critique in person. You’re at greater risk of being given a possibly foolish directive rather than being part of the process of coming up with a new design solution.

As Jérôme Cukier notes: “There is always a problem with the data and multiple feedback loops that can increase the time up to ten fold.” This should be built into your timing and estimates, realistically.

Cash Flow and Some Nitty Gritty

Get a financial adviser, or advice from someone who knows about money and small businesses. Both for taxes and for planning savings, you need professional input. I had to take an evening accounting class early on, because I had no idea what to do, even in QuickBooks “SimpleStart.” My tax accountant helps me understand when I’m likely to be writing off too much (conference travel, software, machines, books…) and raise red flags at the IRS.

Bill your clients with Net-15 terms (don’t let them talk you out of it; I did, and ended up with a bureaucratic client owing me $17K for a couple months before they processed the paperwork). “Net-15” means payment is due 15 days after you submit an invoice. (Ideally, you attach a clause saying they will be penalized for delay after the due date, but I have no idea how you realistically enforce this.)

From a billing perspective, I’ve had better luck being paid on time by mid-sized or small companies; giant corporations with paperwork complexity can delay your pay cycle by months, and this is serious to a cash-flow-driven business like consulting. Always bill sooner rather than later. Some people tell me they require part payment in advance, to mitigate these possible delays. I don't (yet) have the balls for this myself.

"Taking a part-time long-term gig will help tremendously with cash flow." (Me)

Taking a part-time long-term gig will help tremendously with cash flow and reduce your anxiety over finding work in dry periods. To do this, try remaining with a client on retainer or hourly basis for a couple days a week after doing significant project work for them; or start with one client for a long-term period and scale back later. I have a part-time retainer job with a local university, and although the pay is lower than most client work, the flexibility in hours, as well as the chance to learn new things on the job, make it a reliable gem for me. Teaching workshops is another way to pick up income, especially if you re-use materials. (If you don’t re-use, it’s a lousy way to pick up income. Trust me.)

Other mythical options for cash flow are passive income sources from writing a best-selling app, or a best-selling book, or maybe winning the lottery. If you write an app, remember you have to do support and field customer requests (cf. Moritz’s interview on FILWD). I’m not really serious about the book: I have no great evidence that one can count on royalties or gigs coming from them, and the work required is enormous.

A grim reminder from Jérôme Cukier: Don’t count on the “sure things” in your inbox. Any one of them call fall through and leave you with nothing incoming.

"Don't count on 'sure things' in your inbox." (Jérôme Cukier)

Keep an eye on your bank balance at all times: It’s possible to stay busy working on talks, contests, your web site, favors for friends, learning new tools -- and forget about the need for billable hours entirely. Especially when you’re self-motivated, and not just employer-motivated, and you’ve got zillions of “personal projects” you want to be doing as well!

A Few Pros and Cons of Consulting

Several friends of mine have quit consulting and taken full-time jobs. The uncertainty of the income and the need to do business development, book-keeping, and administration wear some people down. Also, the loneliness can be bad; or just the lack of team and occasional inability to see a project all the way through, if you're a cog in a bigger effort.

Once I delivered some super code to a start-up, they paid me, and I never heard from them again. That’s normal, for a consultant. But later I went looking for an example in that code I’d sent them, and realized I had forgotten to attach the file! Now that’s depressing. No one works for just the money. If you’re in a company, you get to make sure your stuff is passed on to the right people and can advocate for it seeing the light of day when no one else cares as much.

But the pros of consulting for me outweigh the downsides pretty dramatically:

  • Not being in an office/commute 5 days a week (I get a lot more done when I write code from home; UX work, however, is much harder without face-time all the time).
  • Paid hourly most of the time, so long weekends aren’t “free” work like they were at my full-time salaried jobs
  • I’m in charge of my own conference/training decisions: where I send myself and what to learn are HUGE issues for my career, and never were for employers
  • Vacation time is mine to determine (after living in Europe, I can’t do without serious time off)

If you’re a consultant, you’re in charge of your career. No one else is!

Wrapping up: I hope this whole post didn't sound too negative. I admit I have never worked so hard for so little money as I did in 2013, due to many of the red flags and unwise gigs. But I also had loads of fun, met a lot of great folks in the same field, and had some amazing clients and data projects. I'd probably do most of it again.

Resources: Places to Find Data Vis Jobs

People Resources

With email or bar chat input from (in no order):

Also, some recommended posts/threads:

Thursday, August 15, 2013

PyData Boston 2013: More On Fiction Analysis

For PyData Boston, I did a recap of parts of my OpenVisConf talk, with some more technical details added, including an IPython notebook of some useful code.

The slides are here:

The IPython notebook with some useful code samples is here.If you want some sample data files, email me and ask? I'm concerned about rights with respect to the fiction files.

Sunday, June 16, 2013

Analysis of Fiction (My OpenvisConf Talk)

Here are the slides from my talk at OpenVisConf in Boston in May!

And here is the link to the video (30 mins), which might be funnier than the slides:

Finally, I did put most of my visualization tool demos online, which are linked from the talk itself. (These are visualizations I threw together in D3 to make it easier to interpret the output of my machine learning and stats analysis, since I was dealing with long text -- I needed to be able to browse the results and see the text on demand, too.)

I'll update this post with those links too, later, and maybe say a few words about my process, too. I'll be giving another talk specifically on visualizing LDA Topic Analysis in July at PyData Boston, building from some of this work.

Wednesday, March 27, 2013

Data Visualization with Nodebox

For PyData 2013, I put together a talk on using Nodebox OpenGL for data visualization. My goal was to expose the data science audience to a flexible tool similar to Processing, but that allows one to write in Python and use Python data libraries. (The java-esqueness of Processing has always put me off reaching for it when I'm working, despite a general fondness for it. I still own every single (English) book on Processing, AFAIK.)

ETA: Web video of my talk is here on the Pydata vimeo site.

My talk was generally well-received, although I think I flummoxed the stats graphics people a little bit who probably weren't expecting something so "sketchy" from me. Hey, I love those other tools too, and use Matplotlib (and d3 too!) regularly.

A few quick comments on the Nodebox eco-system: The current focus of the team in Leuven is on Nodebox 3, a block-diagram visual programming tool, not the 2 variants I talked about (Nodebox 1 and Nodebox OpenGL). I think NB3 veers away from usefulness for the data science crowd that might benefit from a Python alternative to Processing. If the enormous success of the java-based Processing is anything to go by, I'm not crazy in thinking a Python tool like it should be huge! After all, it's cuddly Python! So at the end of my talk, someone actually asked me why he should have sat there for 45 minutes if I was not talking about thriving open source code with a huge community behind it. My response was, more or less, "It's already super useful which I hope I showed, and more people could be working on it than just the original authors." That's how open source works, right? (By the way: That guy apologized to me later, but I didn't take it badly when he said it.)

A couple more comments on my slides: My own data experiments in the deck weren't incredibly successful, largely due to issues with the database I used. I wanted to explore Shane Bergsma's gender-of-nouns database collected off Google news, and what I found was that it thinks everything is really "male." Cuz most news articles are about men, probably. (Also, it proved less useful on older Gutenberg books, because old-fashioned vernacular nouns don't appear in the db, like "momma." So out went Pride and Prejudice and out came my credit card for Kindle books.) Hence, all my fiction gender plots look kind of like these, with heavy weights towards male and neutral nouns:

The pdf of my slides is here and the code zip file is here. Do check my appendices: I figured out a bunch of issues related to paths in Nodebox 1, running NB 1 from the command line, and the like.

A couple nice post-conference mentions: Jake Vanderplas's take on Matplotlib history and visualization in Python, which has some interesting comments. I spent a while talking to Ben Lorica (@bigdata) at PyData, and he nicely mentioned Nodebox in his well-RT'ed article on how Python Data Tools Just Keep Getting Better.

Also, before the conference, I was interviewed for a podcast about data vis skills. I didn't advertise this very broadly because of a few mistakes in the initial post (one in particular that claimed I hated d3, which is certainly not true at all -- I said it had a learning curve, you can listen yourself!).

Friday, February 15, 2013

My Upcoming Talks, Spring 2013

I've got a busy few months ahead! Here's where I'll be speaking...

PyData SV 2013 in March

Peter Wang from asked if I'd submit something to PyData SV, perhaps after I noted the lack of women speakers at the last 2 events. :-) This small conference is the best place for python data science talks -- I've enjoyed and learned a lot at both previous ones. I'm happy to be talking about using the Pythonic versions of Nodebox as tools for data visualization.

Lean UX NYC in April

In April, thanks to Will Evans, I'll be giving a workshop on quantitative skills and analytics for product designers at Lean UX NYC. Here's an interview with me on their website, talking about becoming quantitative and lean data organizations. I'm still toying with the final content, but I expect to cover some advanced Excel maneuvers, a little bit of Google Analytics analysis, and some stats of use in UX work.

OpenVis Conf in May

It's a new visualization and data conference, the OpenVis Conf! and @ireneros are running a great new event in Boston, and I'll be speaking too! Here's my talk plan (titled "The Bones of a Bestseller"):

How do Dan Brown and Stephanie Meyer do it? Most text visualization focuses on word counts: in this talk, Lynn will illuminate how fiction "looks" at a meta level, using a combination of meta-linguistic analysis and simple machine learning. Beyond just words, long texts are composed of sentences, paragraphs, and chapters, and the pacing and theme are reflected in these as well as word choice. With a little finesse, we can detect and graph the famous story arcs that screenwriters and fiction teachers are always talking about. With a little more finesse, we can write an action scene detector or a sex scene spotter and visualize how exciting a novel is — in all senses.

I know a bunch of Twitter friends are coming to all 3 of these conferences... I can't wait to see you all!