Ghostweather R&D Blog: 2011

Sunday, November 20, 2011

A Kindle Fire Review (from a Media Fan)

I'm a Kindle fan, and an Amazon fan. I really like their media content: I buy Amazon music, Amazon Kindle books, TV shows, Android apps. So when my Kindle Fire came, it was pretty much pre-loaded, and that was really nice. All my stuff is sitting there with a little "download to device" arrow, which rocks.

I got this thing because of upcoming travel over the holidays (I don't own an iPad, I think they're too big). I was never intending to take the Fire instead of my reading Kindle, and after 5 days, I still wouldn't. Partly that's battery-life-related; I adore my reading Kindle for the everlasting, never-needing-to-charge-it, one-handed reading wonder that it is. The Kindle Fire battery supposedly lasts about 8 hours, and that may not be true with video watching and wifi on (I haven't tested that part yet).

So, this is not a Kindle-killer, anymore than it's an iPad killer, 'nuff said there.

More specifically, I got the Fire for video watching, web browsing/email/twitter, PDF reading, and light app use (Solitaire, Angry Birds, etc), in about that order of priority. So let's hit those, with some UI observations along the way, because that's where the chance for the Fire's improvements really lies. Then I'll finish up with a few comments on major navigational issues, e.g., scrolling, selecting, typing, which permeate the product.

Video Watching and Disk Space

The Fire seems to want you to mostly stream, which doesn't surprise me. The 8GB drive, and the free Amazon Prime (streaming only) support this. Netflix and Hulu Plus work on it (install their free apps from the store). If you have ever paid for a TV show ep (I sure have!) from Amazon, THOSE can be downloaded to your device. (Browse to a show you have bought episodes for, and they tell you they're still yours, and you can download to your device now!)

Why does this matter? If you're wanting to use it on an airplane, or in iffy hotels off the grid, which I do, you need to download to your device. And if you want to load video you already have, I did the research: It only recognizes MP4, so you need to convert stuff. (I'm using AVS Video Converter; my version does only one file at a time, which is proving to be a giant slow babysitting process.)

You can load videos (or PDFs, or mobi files, or anything else) when you attach your device by USB cable. Drag them into the Videos folder.

But don't expect them to show up in the Videos section of the UI, reachable by the top tabs! They will be found in the rather hidden pre-installed "Gallery" app, which is where your photos and videos live. And then you may be surprised by how poor the UI is for the videos (I am praying they fix this, it's un-manageable!) They appear as a tiny thumbnail with no text; you must select, and then choose "Properties," in order to figure out which one is which. This will get old fast, not just because selection is so funky on this device (more on that later). Here's the videos display with 2 videos:

A short season of one show could run just over 3GB. The actual disk space available to you is not 8GB, because of the OS etc; it's really 6GBish. To find out what you're using, you need to hunt a bit. There is no disk meter in the top accessory bar where Wifi, battery, and other settings live. Tap that bar, and you'll see options like volume. (Yes, it says "Lynn's 5th Kindle," I don't want to talk about it.)

You need to hit "More" and then click into "Device" to see the disk usage. That's really annoying for a device with such a small drive. I wouldn't be hoarding content on it, but for non-wifi situations, having downloaded content seems pretty important to me. I'm really befuddled by this one.

This said, my MP4 videos do look and sound nice. I'll be spending the evening getting ready for that trip.

Web Browsing, Email, Twitter, Etc.

I am guardedly pleased with this so far. I had some issues getting the built-in email app to recognize a Verizon Yahoo address, but the Yahoo mail app worked fine. Tweetcaster works nicely, and I even get a tiny cute beep in the notifications bar when someone @ mentions me, which is nice (same cute beep for email I receive).

The web browser does support tabs, which is great; but the favorites/bookmarks have one major issue: There seems to be no way to delete one. Huh? So it came built in with ESPN.com and a few others I never use, and I can't remove them. If this was a UI design mistake, it's shocking; if it was policy for some payment by partners -- unlike Amazon in so many ways, who are usually all about the customer.

Please fix this, Amazon.

Web pages also allow you to remember passwords, which - thank goodness. Typing is such a damn pain (see below).

Since web pages look good, and play video (including flash), this is a real plus on the device. Selection of links is funky, and I sometimes don't know if the selection problems I am having are due to the OS, touchscreen, or some web loading/processing issue.

PDF Files

PDFs on the e-ink "reading" Kindles are terrible - when they took away the text reflowing option for PDF docs, it becomes impossible to really read them, requiring too much zooming, scrolling, etc, and any images take forever to load and are, of course, B&W.

Most of this is awesome on the Kindle Fire! Definitely a reasonable PDF reader. The documents look great, and my only issue is the weird scroll-down, then to the right, for navigating a large document. It would be nice to have an option for "just scroll down" to get through a PDF document, instead of trying to use the book/paper metaphor of flipping pages. Here is how pretty PDFs look (yes, this is fanfiction, deal with it; of course I tested academic articles too).

Here's a page in portrait, with arrows suggesting how I need to scroll (down to get to bottom of the page, then flick to the right to "flip" to next page).

Thumbs up on PDF reading. They appear in your Documents folder as you would expect, and you don't need to send them to your device for conversion, they just "work." There isn't a Kindle Fire Instapaper app, but since you can save the site in your bookmarks and read text only, or download as Mobi files, you are all set there.

App Use: Angry Birds

Angry Birds is great. So is Solitaire. I haven't tried to install any apps that aren't in the Amazon app store, although you can (instructions abound on this). Note: I installed these on my Android phone and can't use them on it, screen is too small to really do it right. This form-factor is just fine for games that need a wider field of vision, or for people who are getting older and blinder.

I also installed a drawing app, but I don't much want to draw with my finger, so.

I like, and have always liked, the Amazon Android Apps store experience. In some ways, it's better than Google's app store. I'd expect that from Amazon UI, but it's nice to see on Amazon's first dedicated Android device.

Typing, Scrolling, Selecting, Turning the Page

The use of the touch screen is my biggest peeve. It's just buggy! If it's software, I expect a fix update -- Amazon is always good about updating Kindle software. If it's hardware, it's just a damn shame, and I'm kind of shocked it shipped this way.

Typing: The on-screen keyboard behaves very badly in portrait mode. My space bar and the letter "c" seem to be hyperactive for any key I pick on the right side of the keyboard. It's so bad, I will just switch to landscape for anything I ever need to type. The typing issues make the device less fun for email/twitter than I hoped. I am very sad about this.

Scrolling and Selecting: I have had so much trouble trying to scroll vs. selecting what's under my finger that I even looked it up in the help, and watched some demo videos online to see if they were doing it differently. It's not obvious. I have similar problems on my Android phone, which either means the OS itself is to blame, or making good touch screens is really really hard and Amazon's hardware providers failed. I spend a lot of time hitting the "back" button to undo a selection I didn't mean to make while I tried to scroll, especially in Tweetcaster or email.

If I were creating an Android app, I might consider making a dedicated scroll bar, just because it would offer some (admittedly old-school) way around this crappiness in the UI.

Incidentally, scrolling is very important in the apps that Amazon built for showing your bookshelf, your music, your videos... so this problem is quite profound.

Turn the Page: For books, without the e-ink hardware buttons, you need to flick or tap to turn the page. It's a slightly delicate maneuver, since it's easy to hit too hard and bring up the menu bars etc. Also, I am not so convinced this is a read-with-one-hand device. I'm not convinced by the reviews of the Amazon Touch either; if you're holding it in your left hand, tapping on the left side goes to the previous page. This is another surprising gaff on the UI side, for me. I'm not left handed, but I read left-handed about half the time.

Summary

I am very pleased by the PDF experience, and mostly like the apps and Web experience. I didn't buy this to replace my reading Kindle, so no real comment on that side.

I am shocked by these things, and expect software updates to fix some, if not all:

Lack of a disk usage meter on the top info bar. Related to having very little storage on the device -- I admit, I wondered how hard it would be to crack it open and install a larger hard drive. We all did that with our first TiVos for years...
Touch screen badness - for typing, selecting, scrolling... If this is hardware, we're rather screwed, I bet.
Inability to delete web bookmarks (sheesh, seriously, Amazon?)
Better UI for seeing your installed videos on the device. Option to see what the darn video is, without having to select it and go into "properties" first. Which is hard because of the touch screen issues.
Possible option to just use a down-scroll on PDF docs, rather than flick-right to turn the page.

My quibbles aside, I do like it, especially for PDFs and apps. I'm looking forward to the Fire evolution and expect to see software updates (or at least good apps) addressing some of these problems very soon.

Sunday, October 30, 2011

A Personal Take on Infovis 2011

I haven't had time to go thru the papers I liked and didn't like yet, but I have been musing on some other aspects of Infovis that I thought I'd recap. To situate this, I usually go every other year to Infovis, and have been doing so since mid-2000's, I guess.

Who Went, Who Didn't; Design vs. "Science"

Partly due to irritable blog exchanges in the past couple years, and partly due to perceived relevance of papers and audience, many of the artistic practitioners of infovis did not come. Or, if they did, I didn't know they were there. By this I mean academic artistic sorts like Golan Levin and Casey Reas and Dan Shiffman, and the practitioners like Stamen, Moritz Stefaner, JanWillen Tulp, Jer Thorp, Wes Grubbs, Ben Fry and Fathom, David McCandless, etc. (Kim Rees from Periscopic did attend. I wish I'd gotten a chance to talk with her.)

Martin Wattenberg and Fernanda Viegas, who are successful straddlers of artistic, industrial, and academic infovis, didn't make it either. They weren't boycotting, it was due to work and personal reasons. (Google+ Ripples, a project of theirs, launched while we were sitting in paper sessions.) I mention them because a handful of years ago they tried to bridge the communities (with Golan) in starting an art track. I don't think the momentum has been entirely conserved. Certainly the papers didn't reflect great focus on emotional, artistic, or design processes. The one most focused on design as process was a very dry and obvious overview how to do "user-centered design for beginners" that caused an industrial colleague of mine to observe "the bar for acceptance seems very low here." (It's not, but that one did make me raise my eyebrows.)

Again, this said, Amanda Cox's brilliant capstone talk, which was largely about design process and decisions at the NYT, was a huge success. As was Jessica Hullman's talk on visual engagement methods (or "chart junk, the sequel," as someone noted--Jerome Cukier, possibly).

I know some members of the program committee are trying to figure out how to get more industrial attendance. CHI has been through this for years, and added various case study tracks, panels dedicated to industrial talks, alt.chi for less mainstream academic works, among other strategies. Infovis could use some of this, but attracting people who have successful careers already, and convincing them there is value in attending given the pricetag, needs some more thinking through. I see value for them in the algorithm side of many of the papers -- but that might not be worth the cost of attendance for them.

Maybe the drinking would? I know some of us talked about the artistic non-attendees over drinks, since they weren't there to participate. More on this below...

One more contingent: there were a lot of folks from the intelligence communities, DoD, the government in general. My perception is that this has increased. And I think they asked smarter questions this year; they certainly weren't shy about going to the mic.

Paper Experience Sure Differs, Depending on Your Perspective

During a bunch of papers, the demo or video had some astoundingly beautiful angle or process moment that just wasn't published "point" -- it was almost incidental. I'm thinking especially of the beautiful organic edge bundling videos from "Skeleton-Based Edge Bundling for Graph Visualization" by Ozan Ersoy et al. (see this page for some recap.). My comment to Jen Lowe was that Jer Thorp and the Processing crowd would have loved this, and with the algorithm detail in the paper, would be able to implement and tweak quite easily. I can't find their videos anywhere, though! (Note: Even the first questioner afterwards said "I could watch your videos forever," but it was kind of in an undertone, not her point either. Let's have more talks where creating beautiful effects is a part of the point, perhaps?)

Mike Bostock's D3.js talk was fascinating to those of us who had read his slides from SVG beforehand, but hadn't heard his commentary on them; and if you knew the DataMarket protovis-vs-d3 history online. It was also nerve-wracking worrying about who would ask what afterwards given some of that historical controversy. Apparently not so for other attendees, I heard later! I find Mike's arguments convincing, although I have not tried to build anything sizable in D3 yet.

Jo Wood's et al.'s BallotMaps talk about name-order biases in voting districts was a wonderful "process" talk on using their HIVE system to visually test hypotheses. (For general info, see their org page.) I feel that the talk with demo of stages of visual exploration was important in making the story work, and the paper isn't as easy or fun to grok. Aidan Slingsby et al's talk on showing uncertainty in cluster results was similar (and surprisingly, the paper seems to differ quite a bit in the system design shown).

Program Committee: I'd like to see more videos in the proceedings!

Student Distractions: To Finish or Not?

As an ex-research type myself, I'm always interested in what grad students are going through now, what topics they and their advisors find valuable to study, and what my friends are facing as advisors. Stanford and Berkeley students seem to have a lot more distractions from start-ups given the "big data" and "data science" world we're in now. At the Stanford-sponsored party, I actually found myself recapping all the reasons to finish a Ph.D. to some poor guy who had no intention of quitting his. (Sorry, S, too many drink tickets.)

I don't necessarily use my own Ph.D. (except maybe socially at conference parties), but I have certainly concluded that spending years in a university surrounded by other smart people is not a bad thing. After all, the business world is usually not as smart, face it. And you will have many years to work a 9-7 job after school, so why rush out? The chance to sit in on other departments' classes, even when it's not a requirement, is a chance you don't usually get after graduation. Infovis, like HCI, is (or should be) interdisciplinary; being able to be in stats courses, graphic design courses, programming courses, psychology courses... well, if I were a student now, I'd want take advantage of those wonderful distractions. (I did when I was finishing up, but did NOT take enough stats. Luckily this is fixable with online courses, to some degree.)

Overall, More Drinking Than Usual

I definitely had more fun drinking with people who knew a lot about drinks than I have in previous years. They knew about whisky, cocktails, wine, vodka infusions. Beer too. I was humbled by their depth of alcohol knowledge. Doesn't this convince you to come next year? Stanford threw a good party too, to try to improve the conference party scene.

Maybe you'll come next year.

Wednesday, September 07, 2011

Combing Through the Infovis Twitter Network Hairball

A month or two ago, Moritz Stefaner posted this image of "infovis" folks on twitter, with nodes sized by number of followers ("in-degree"):

I dropped him a note wondering if he'd tried any social network analysis methods to simplify it, or otherwise break it down -- so he sent me the data and said "have a go!" If I had crawled twitter links myself, I might not have used his criteria or seed set, but I was curious if I could make any more sense of his data set as is. (So I've neither re-crawled, nor added any info such as frequency or content of tweets to this data set).

I compared some of the measures calculated by the python library NetworkX with measures calculated by Gephi. The two tools produce slightly different scores for some metrics, an interesting fact which I have not investigated deeply. I've made my spreadsheet of the calculated stats available for you on Google Docs. (Variables prefixed with "NX" are calculated with NetworkX and with "Gi" were calculated by Gephi.)

First, some overall stats on the network in Moritz's dataset:

1644 twitter id's are represented, and there are 145,382 edges, or links, between id's.
Gephi reports the average path length is 2.5.
Gephi and NetworkX say this is a connected graph; Gephi reports 1 weakly and 5 strongly connected components.
The average degree is 89.9, but the median is 51. There is a long tail here, meaning that some nodes have very high degree (see below) but most do not.

Derek Green's excellent tutorial for NetworkX suggests doing community detection using another python library for the Louvain method. At superficial review, it's similar to the Gephi modularity class detection algorithm, but I got slightly different results from the two methods. [Update: The method is non-deterministic and results will vary depending on starting values used]. NetworkX generally finds 5 communities, and Gephi alternates between finding 4 or 5. Here is one confusion matrix, showing the differences between node allocations assigned by NetworkX and Gephi; in this chart, squares are sized according to number of nodes assigned to each group:

So, interpreting this: In one run, Gephi split up the folks who are in NetworkX's community 0 into Gephi's communities A, B, C, D. Gephi's community C mostly overlaps NetworkX's community 1.

For the rest of this post, I'll illustrate from NetworkX's community divisions, which I spent more time investigating. When I looked at the force-directed layouts and stats for the community members, I decided on these approximate group names, based on what I knew of the id's in each group:

Group 0: The Authorities
Group 1: The Researchers
Group 2: The Processing Crowd
Group 3: The Small NYT Group
Group 4: MSLima's Crowd

These are a bit arbitrary as names - based on who I myself recognized among the high degree members. (I myself live in group 1, way down the list-- feel free to check out where you live, in the spreadsheet!)

To make sensible (i.e., less hairy) plots, I filtered for the top 5% by degree calculation. "Degree" corresponds to sum of in-degree and out-degree edges; in other words, how many people a node (or twitter id, in this case) is linked from and to by other nodes. High "in-degree" count usually implies someone is a perceived authority in the network. High "out-degree" might suggests a social media corporate type. Well, not necessarily - but it means they follow a lot of folks, and could themselves be a useful information source if they also have high centrality and share their information. (Like I said, I didn't look at who said what or how often they tweeted, which would be important measures of health in this network.)

Here's a plot of Gephi's authority calculation vs. degree, strongly correlated (you may see why I named Community 0 "The Authorities"):

Sorting by degree, the top players are these (pulled from the spreadsheet):

Label	NetworkX Community	Gephi Class	Degree	Closeness Centrality	Betweenness Centrality
flowingdata	0	D	1394	0.446930423	0.043313537
datavis	0	A	1376	0.482190168	0.072294856
infosthetics	0	A	1362	0.435290991	0.034115345
infobeautiful	0	D	1074	0.391210891	0.007498017
blprnt	2	B	932	0.410115173	0.02337346
ben_fry	2	B	882	0.365625	0.006936445
moritz_stefaner	0	B	870	0.452361226	0.028942168
eagereyes	1	C	861	0.455126424	0.031837862
mslima	4	A	828	0.433448002	0.014404322
VizWorld	4	A	828	0.524495677	0.08984938

Showing edges in hairball graphs makes things really complicated. For the following network graphs, I've limited the displayed nodes and edges to the top 5% by degree measure. Here's an animation of the difference between all edges visible vs. just community-internal edges (I know it's subtle, sorry; the id names are sized by relative degree):

Non-animated, larger versions: With All Edges, Only Intra-Community Edges

The largest names are purple, community 0, or "The Authorities" (a proxy for degree in this case).

Since I chose "degree" for relative sizing, it's worth seeing that in- and out-degree are not always correlated. Here you can see that some "true" authorities have much higher in-degree than out-degrees. In particular, VizWorld has very high out-degree but rather smaller in-degree. And by NetworkX's community assignment, he does not end up grouped with the purple community 0. (Click for larger view.)

However, when we look at betweenness-centrality, VizWorld scores quite high. Betweenness-centrality (or centrality) roughly measures connectedness to components of the larger graph.

If you'd like to inspect the internal linkage structure corresponding to each community subgroup, click on the small images below to view. I've filtered out all but the top 5% by degree, to highlight the authorities in each sub-group. (Note that this was insufficient for community 3 -- so I expanded it a bit more.) The curved edges indicate "mutual" follow relations, while the straight edges indicate uni-directional, or one-way follow relations.

community 0, The Authorities

community 1, the Researchers

community 2, the Processing Crowd

community 3, the small NYT group

community 4, MSLima's crowd

Notice that community 0, the purple one, has a surprising number of unidirectional links, as does community 3. The others seem to be dominated by curved lines, a high degree of mutuality. (Hopefully I can explore this later!)

Depending on what you know about the players in these graphs, you will probably see things I don't see. I myself have very little familiarity with the names in communities 3 and 4, while I admit to being surprised or entertained by the links and organization in the other 3 graphs. For example, in community 0, the placement of Visually, and its straight line uni-directional links, is especially interesting to me. (Remember this graph represents the top 5% by degree-- so Visually at this time scored high on degree, and was classified as a member of the "Authorities" group 0 by the community algorithms, but was not itself closely followed by the others in this elite group.) Green community 2 is also interesting; certainly the artistic folks are there, including the founders, authors, and teachers of Processing courses (ben_fry, REAS, shiffman, blprnt, toxi, mariuswatz, ...); but this group also includes Brainpicker and well-known design firms like Stamen and PitchInteractiv.

Wrapping up, these are the tools I used for the analysis, charts, and graphs: Excel, Tableau (for scatterplots), Python, R (correlation plots which weren't shown here), Gephi, Google Docs, Illustrator and Photoshop. It took more time than I expected, in part because of Gephi's alpha status, and having to adjust a lot of the plots by hand in Illustrator! Hopefully the need for hand-tweaking will disappear as Gephi becomes more mature.

Postscript: While I was working on this, MS Lima's new book, Visual Complexity, shipped from Amazon. It's a beautiful collection of network visualizations.

Sunday, March 20, 2011

PyCon 2011 - Data, Men, and Me

In the past couple years, I've switched from sending myself to research conferences (like CHI) to more down-and-dirty developery conferences. I'm looking for skills development and tools I can use day-to-day. This spring I went to PyCon in Atlanta, since I've been using Python more and more for data analysis problems. (Complete talk videos are here on blip.tv.)

The initial draw was the tutorials. I aimed for cloud data and machine learning. Olivier Grisel's tutorial Applied Machine Learning in Python with scikit-learn was a definite high point of the conference for me. His talk on text analysis was very good as well -slides here, and video here. His French accent was very nice, but I kept mishearing "scikit-learn" as "psychic learn." :-) I also really enjoyed the talk on Genetic Algorithms by Eric Floehr, a fellow who seems to do weather prediction consulting. His slides and a bunch of other interesting supporting material (including code) are up on his site.

There were a lot of talks on data, big data, cloud data, and scaling Python (to handle big data and data problems). Other examples: A talk on Pypes by Eric Gaumer included a good reminder that big data problems existed in the search engine space long before other kinds of big data became "hot" to work on. Pypes is a quasi-visual toolkit for doing data processing inspired by Yahoo pipes. (The gist being that since a lot of data handling involves discrete steps to clean and transform, you can put these steps into little modules that allow you to view the big picture of what's going on with your data munging.)

Hilary Mason's excellent keynote made a lot of us data geeks happy; she called for programming language evolution to get closer to the data problems, and to be less cryptic when it comes to support for multi-threading and map-reduce strategies needed these days. (I loved her "WTF?" comment on her multithreading code example.) Yelp's "mrjob" library for the cloud might answer some of her issues, but I missed that talk for some reason!

Another talk on big data that was well-tweeted was C. Titus Brown's "Handling Ridiculous Amounts of Data with Probabilistic Data Structures." Slides here - probably requires the video to fully interpret this, at least it does for me (yes, I missed this one too).

Not all talks were excellent, of course. My linguistics degrees got grouchy during one on the linguistics of twitter -- or maybe it was my geeky side asking "what can I do with this?" Some talks were nice surprises, too, kind of the point of going to conferences! Based on lunch table happenstance, I ended up going to a Blender API talk by Chris Allan Webber, a subject about which I knew zilch. Blender is apparently beefing up its API for external calls and automation; as a visualization person, I'm interested in tools that I can "drive" with data as input. I have big hopes for the evolution of processing.py and Nodebox2, two pythonic visualization options, but I am not sure they're there yet for me as a data vis person.

My sad female nerd note: I was one of 3 women in the Machine Learning tutorial. Out of perhaps 40? I later heard via Twitter a guess that there were only 8% women at the conference as a whole, based on t-shirt orders. I loved Hilary's talk, but was a bit bummed out by the Dropbox keynote that featured the social network of "friends of Arash" who started that company -- yeah, all men.

A final comment for any UX folks reading this: This would've been a great audience for a talk on UI design in open source, or UI design for Python UI's. There were a lot of companies presenting: Dropbox did their "we use Python" talk; Evite apparently has rewritten their entire java backend in Python; Threadless, a sponsor, is all Python... One of the reasons for its growth at these companies is the ease of writing things fast in Python; the "prototype and iterate" philosophy showed up over and over in various presentations as a real strength of Python. As a light coder myself, I can't agree more. I was there as a data-oriented geek, but I saw UX opportunity everywhere, for the right kinds of UX folks.

Subscribe to: Posts ( Atom )

Menu Bar