Monday, July 18, 2016

Spring Semester D3 Class (take 2)

I reprised my class on interactive visualization with D3.js at University of Miami with twice as many students in the spring semester. It was successful again, but I found it noticeably more work having more students. We did not focus on UNICEF this time; instead each student explored a topic of their own interest. More on how it scaled (or didn't) after some project highlights...

One note: Very few of them are reading their school email over the summer, so there wasn't much chance to update bugs or design issues I requested fixed. Fair warning :)

Iceland's Energy Use

The project by Zhiming "Eric" Sun is gorgeous and draws smart data comparisons about energy use over time by different countries. Iceland is the star here. There are nice tooltips with plots of use over time in this first display in the scrollytelling story.

In the next stage of the story, there is a slightly hidden lovely feature, which is that a mouseover on a bar zooms the map to locate the country in question. Useful!

There is a super interactive line chart with some smart commentary, and then a great connected scatter plot showing comparison country trajectories.

I think there are still some odd aspects of the scrollytelling display—as we constantly discovered in my class, getting the scrolly display/hide tuned right isn't that easy, at least the way we were constructing them.

Eric will be an amazing hire for someone when he's finished with his MFA. He was a superb help with other students' coding issues and has a strong sense of data story. His technical curiosity and ability to figure hard things out on his own were superior.

Chinese Tourism

Yuxuan "Sunny" Xie did a great project on Chinese tourists, specifically on where they go and what they spend. Her project features small multiples, scrollytelling, lines charts, bar charts, and a neat exploding map of China. The map is built with the d3-exploder, which she dedicated herself to trying to adapt and use.

The map of Chinese provinces shows tourism outbound, with mouseover and clickable provinces. Each provinces has a little story typifying a traveller from that area.

The scatterplot version of the map shows each province along 2 axes, population and income, using the same color scale for traveler percentages.

Sunny is a dedicated journalist with a strong sense of data design, visual design, and interaction. Another excellent hire!

Transport in Miami Dade

Jennifer Hernandez looked at the distribution of public transportation options in Miami-Dade county. She made some amazing point maps of bus stations (the small blue dots, which have tooltips) and the metrorail stops (bigger dots).

In contrast, the locations with the most affordable houses are shown in pink here:

Jennifer has other stats on ridership by location and commuting preferences in her original project. She is another excellent MFA student in the Communications Department at U of Miami.

Climate Change

Shi Li, who was auditing my class, nevertheless did all the (hard) work and produced a lovely project on climate change. Her project features a number of striking charts, including a small multiples-details-on-demand array of bar charts based on modifying Jim Vallandingham's coffeescript demo. There is also an animated bump chart of the causes of global warming, with the dramatic lead going to greenhouse gases ever since 1990.

Cost of Education (2 Projects)

Sherman Hewitt, my only undergrad student this semester, did a great project on cost of college across time in the US. This bar chart from his data shows that as of 2012, there had been an increase in the number of folks attending college, in part fueled by an increase in the number of Hispanics attending.

His map shows that the most expensive states for public universities are Vermont and New Hampshire, with the cheapest being Colorado.

Sherman's project is strong in the reporting text as well. He has a great future and is working as a data journalism intern this summer.

Terrorism Over Time

Claudia Aguirre is working with a large dataset on terrorism, and produced a solid project on incidents and deaths by different terrorist groups over time.

One of her more interesting charts shows incidents by group by year, and the extreme and sudden increase in ISIL is the steepest, highest, and most recent of the blue lines. (The other highlights are Taliban, Al-Shabaab, and Boko Haram. The 1980ies are dominated by Shining Light and other Central/South-American groups.)

Formula One

Zhou Fang indulged her knowledge of Formula One racing stats to focus on the history of the Ferrari team.

In the map above, she shows the win history of Ferrari vs. other teams across 15 years of F1. When you click on a country in the map, small multiple line charts are displayed for each country by team, as well.

Travel Prices

Sevika Singh did a project on hotel costs in different cities. The differences between one-star and five-star prices in the same cities are particularly interesting. Here's a scatter plot showing the relationships, with some UI to help you find a city of interest. (I believe the dots are sized by average price, with New York having the largest average.)

Developer Survey

Jose Fierro looks at the responses to the Stack Overflow Developer Survey. Although he uses raw counts instead of percentages by country, the results are interesting, especially on the topic of future technology plans. The "big data" and hipster tools like Rust get lots of "intend to use in the future" votes, but tools like Javascript don't. Uh, good luck with that?

Other Projects...

Han Huang's autism project looks at the incidence of autism in the US by state. She uses donut charts, maps, a timeline, bar charts, and other techniques.

A look at California's educational attainment stats from Cibonay Dames shows that California students are underperforming. California is the largest and most diverse state in the US in terms of educational enrollment figures. She uses maps, small multiples, and bar charts.

Luying Wu's project looks at causes of US road accidents. As of 2013, Montana had the most deaths due to road accidents and District of Columbia had the least. Her map of top reported causes by state is very interesting (and may reflect some state data categorization artifacts).

Hyan de Frietas's project looked at which states invest in early education support. He documents that early education enrollment impacts future success.

Eliot Rodriguez investigates drug use by teens in the USA including alcohol, over time. Although prescription drug overdoses are on the rise, teens are still primarily using alcohol, marijuana, and cigarettes.

Former Students!

My fall semester students have done some great things since our D3 class. Barbara Poon got a job in the Emerging Technologist Program (ETP) at Nielsen. Halina Mader has been consulting as a web designer and D3 developer while she settles on her next job. Shiyan Jiang is a data journalism intern this summer with the Florida Sun Sentinel and so far has worked on a map for a story. Louise Whitaker is still an MFA student for another year and is an intern at Sapient this summer.

Jiaxin Liu did her journalism capstone project on the status of financial support for Chinese retired folks, using D3 in strategic places. She is working now as a data journalist in China. Zhizhou Wang is in the Lede Program at Columbia School of Journalism, pursuing further data journalism credentials (a program which looks amazing, to be honest). Luis Melgar is still working as a journalist at Univision, and he also used D3 in his capstone project for his master's on homeless students in Florida.

Three former students (Jiaxin Lui, Zhizhou Wang, Shi Li) worked together on a lovely multi-media article about shark tracking. They did it for Alberto Cairo's Maya class. They used D3, Maya, video, and nice web design. I helped them very little!

Debrief on Teaching This, Take 2.

Did I Help Too Much?
With twice as many students, I had twice as many visitors in my office wanting help making custom, and often very advanced, visualizations. I think I promised that they could do anything with my help; but my help and time were finite resources, and I never remembered that before it was too late! If I were teaching this again, I'd probably be a little more restrictive about the coding I did for them myself. It takes them—and me —longer if I prompt them to solve it themselves with a million hints. This just wasn't practical with the number of people needing help and the weekly deadlines. It's easier for everyone if they just watch me do it while sitting beside me as I talk through it. But I'm not sure that process teaches them enough and it still doesn't scale well. The alternative is for them all to have much less ambitious and interesting portfolio pieces, though. (Sadface.)

My former fall semester students who wrestled with their own problems achieved some excellent results on their own, as you can see above. One of them said as she did, "I'm finally starting to like and understand D3 now." So maybe they did learn even while watching me or reading my code fixes?

And Some Never Asked For Help
There were some students I never heard from and only realized were struggling when I saw their weekly homework or prodded them quite explicitly. The amount of help given was not even across the students, certainly. I put this issue down on my "teaching to-learns" list with some ambivalence about how to solve it. Some of the onus is on students to request the help, certainly...

Data Analysis
I saw a lot of data analysis and data manipulation issues this semester. The real work of data visualization is to get your data in shape, explore it, and then design your visualization and code it. Often the coding requires specific manipulation either before loading or in Javascript, to get the ideal structure from which to "draw." None of the steps can really be skipped. Our course program was definitely lacking in this "munging" and analysis training. Given the choice, I would not teach this class again without a preceding required data analysis class. (And even then, I've heard from other faculty friends that a data analysis course can easily go off-the-rails into endless custom work for the teacher if the students get to use their own data sets. This is a hard teaching problem.)

Javascript Difficulties
On the Javascript side, I assigned more work with Javascript programming and data "munging" than previously. I also spent a fair amount of time on Javascript refactoring and structuring of code, based on issues that had come up during the first semester, especially in the sizable final projects. I heard from some that these assignments were "too hard." A lot of students struggled with the basic programming I thought they knew in advance, e.g., what's a variable, what's a function, how to call functions, scope. Along with the required data analysis class, I would prefer this class is preceded by more solid Javascript preparation or some other programming experience. (Many students had had a prior class in JQuery or P5.js, but since those classes weren't about doing data manipulation, some of the higher level concepts didn't cross over.)

Bottom Line
Teaching this class to people without much programming experience is very hard, and doesn't scale to a large group of students. At least not without a lot of experienced TA support which I did not have. Doing it the same way again, I wouldn't be able to handle more than 15 students.

Updated Course Repo
The class materials, in some parts radically updated, are posted here. I especially added a lot to the maps section, including more Leaflet examples, and added some nice small multiples code examples. I'll probably update one more time soon to remove mention of student homeworks.

News: My New Teaching Job

Meanwhile, my next teaching gig is in Lyon, France, at EM-Lyon, where I will be teaching data analysis, data science, and NLP classes. I will be there for at least 2 years. I am looking forward to focusing now on the "front-end" of the analysis stage, teaching Excel, Tableau, SQL, Python, and R. Look for more insights on teaching those in the future!