Sunday, November 19, 2006

Data and Infovis and "Art"

I've been thinking about data a lot, since the Infovis 2006 symposium. At this conference was a strange mix of scientists, mathematicians, and a few artists, or those with an artistic bent.

My friends Martin Wattenberg and Fernanda Viegas from IBM Reseach Cambridge secured funding, invited submissions, reviewed, set up the equipment for, and then sat guard over (missing talks in which they were cited) an art show of infovis applications. They were specifically featuring artistic displays of real data (I'm paraphrasing what I think they said were their selection criteria. One was Golan Levin's The Dumpster, which I blogged about a while ago.)

To introduce this art show, they gave an excellent talk that I'd summarize as "What's Going On Out There in the Real World That You Might Not Know About." A bunch of us saw a lot of people in the audience noting down the existence of Ben Fry's Processing Toolkit that makes programming datavis apps accessible to artists and ordinary people who aren't postdocs in mathematics. Sadly, it reminded me of 5 or even 10 years ago when the CHI and CSCW research communities realized web startups had already made community apps that worked and they weren't made by researchers in labs. Where's the actual innovation happening? More often than not, it's students or other clever people with time on their hands and a willingness to play around.

But back to data: When I was doing my dissertation, data was a sticky subject. Collecting data on "human subjects" was overseen by strict board reviews and ethical examination, and I had to go through this as an early internet researcher with a Human Subjects Board who didn't know what to do with this kind of data.

The community I "studied" reacted strongly to some of the data that I collected, post-processed, analysed, and reported, regardless of the reviews I went through. My data said some things that they didn't want made visible, or suggested things they didn't like simply reducible to graphs and charts. (The book is available here, the last chapter discusses this problem in some detail.) Anyone who looks at or exposes recorded human behavior is going to hit this: for example, people who don't think they talk much and discover they talk all the time often don't like knowing this, however measurable it is and however potential this exposure might be for them. Which brings up the questi0n of why and when should you turn something into data? And analyse it?

So, thinking now about how the research and infovis worlds have evolved since then, and the new inevitability of data mining on behavior from the traces we leave behind us, I see these data source dimensions:

  1. Data sets that exist and are known to exist-- census data, weather data, stock market data.
  2. Data that "happens" but isn't necessarily assumed captured or turned into a set that's easily analysable: email, chat, mobile phone records, my retrievals from ATMs, where I walk and what I eat.
  3. Data that we set out to measure, because we're looking for something: experimental data, NSA tapping us, etc.
  4. Data we have (from any of the above means) and we converted to another form of data: e.g., turning activity logs into summaries of time on tasks, turning gene sequences into musical notes, turning video of your cats into a single overlayed image, turning text into images, etc.

The really creative apps for infovis often seem to lie in item 4), because transformation of data into other modalities is a trick of visualisation that might give us insights we didn't have before. Some of them are just elegant visualisations of data we wouldn't have thought of visualising (like Ben Fry's zipcode applet that Martin called an infovis "haiku"). The "insight" part is still tricky to handle; human perception differs, and reasoning skills differ, and that makes drawing conclusions from visualisations tricky too. (Untutored people generally make more of statistical tests than they should, too.)

Martin and Fernanda stayed safely away from defining "art" but I still thought about the artistic component of data mining. The value of data mining and the ability to form and then test hypotheses from different views of data is a skill, perhaps even an art in itself. An event occurs: I capture it, I capture multiple instances of it, and I look for patterns in different views of it, and then I learn from it or measure it some more or in another way to progress towards some truth.

Or, for the more artistic data visualiser: she captures it and events like it, she presents it in a novel and beautiful way, hopefully with some elegant interactivity, and other people learn something. The might learn something ineffable or impossible to reduce to words. But that doesn't make it less important. Scientific creativity still springs from the indescribable ideas you have about the world before proof and publishing.

Friday, November 17, 2006

Moroccan carpet seller

In Morocco. (It's tough to pick which ones to post -- I've got so many.)

Sunday, November 05, 2006

Social Drinkers Earn More Money

Somewhat disturbing, but ringing true in a bunch of dimensions... This study shows that social drinkers earn more money than non-drinkers, and claims it's because of the increase in social capital gained by knocking one back with colleagues.
Although there is a united campaign to restrict alcohol, labor market data may surprise noneconomists: recent studies indicate that drinking and individual earnings are positively correlated. Instead of earning less money than nondrinkers, drinkers earn more. One explanation is that drinking improves physical health, which in turn affects earnings (Hamilton and Hamilton, 1997). We contend that there is an economic explanation. We hypothesize that drinking enhances social capital, which leads to superior market outcomes. Glaeser et al. (2000: 4) describe social capital as “a person's social characteristics, including social skills, charisma, and the size of his Rolodex, which enable him to reap market and nonmarket returns from interactions with others.” Some aspects of social capital might be innate, but people can enhance others, such as Rolodex size. If social drinking increases social capital, social drinking could also increase earnings. We attempt to test whether drinking enhances social capital by differentiating between social and nonsocial drinking; we predict that those who drink in public will have higher earnings than those who drink at home. New data confirm that drinkers earn more, and we find that social drinkers earn even more.

The article is here and comes with a somewhat scary libertarian slant intro, be warned:No Booze? You May Lose:Why Drinkers Earn More Money Than Nondrinkers (pdf). Note, this obviously supports the value of conference trip networking as important for career, if money is an indicator of career success (it is to some).

Saturday, November 04, 2006

E3: Effective, Efficient, Elegant

Hi, my name is Lynn and I work all the time. Too much to post much these days.

But I had a nice break when I went to Infovis 2006, the symposium on information visualization. There was a bit too much math this year, starting from the keynote, which was Eades talking about graph layout algorithms. I still managed to get something thought-provoking from it. His criteria for algorithm evaluation was "effective, efficient, and elegant."

These are good principles for software design as well as algorithm design: a good piece of software should be effective at supporting the tasks it's designed for, be efficient in use and for use, and ideally is elegantly designed. Elegance, of course, implies more than "usability." Usability is a word that's got kind of an old school ugly lab study connotation these days; it's a word that doesn't say enough to capture current thinking about the value of delightful design, rather than just adequate design, in creating a differentiating user experience.

What's "elegant" in a proof or theorem, I asked of a friend who was a mathematician in his previous life. "Simple," was the first thing he said. But not just that -- it can be taught to a second year student, was one of Eades criteria (suggesting "learnability"). Yet also somehow "surprising." An elegant proof is a result with a twist you didn't see coming, but should have, adds an insight that makes it aesthetically pleasing.

At risk of triteness, I did look up elegance on dictionary.com after striking out in a Google search: "gracefully concise and simple; admirably succinct. Combining simplicity, power, and a certain ineffable grace of design." It adds:

The French aviator, adventurer, and author Antoine de Saint-Exup'ery, probably best known for his classic children's book "The Little Prince", was also an aircraft designer. He gave us perhaps the best definition of engineering elegance when he said "A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."

E3 makes a good compound principle for evaluating design of all things, including software. I'd really like more elegance in my design.

10 most real life ghost photos (sic)

Ah, Pravda. It never fails to appear on the Anomalist. Here is a sample of photos of ghosts from the Russian online paper: 10 most real life ghost photos - Pravda.Ru.