Monday, December 23, 2019

Another Year in France (Consulting Again)

It's that time of year - time for a recap of what I've been up to!

NLP and Toxic Speech

I spent a year after I left my teaching gig doing remote consulting for a London-based startup.  I was lead data scientist doing NLP, primarily working on toxic speech detection in game chat.  We used a mix of keyword-based approaches, SpaCy models, and neural nets (pytorch and later tensorflow, for speed).  I wrote a lot of Spark code.   In the course of this work, I labeled a lot of chat data myself and became convinced this is an almost unsolvable problem that will always require human-in-the-loop moderation.

Talks I Gave / Personal Projects

Every time someone invites me to speak, I use it as an opportunity to finish a personal project and talk about it.  Sometimes it's a learning project (like "learn about the state of the art for summarization") and sometimes it's an artistic or data vis project.  So, invite me at your own risk :)
  • Euro Python 2019 Invited Keynote and PyData London 2019 Keynote: I gave the same talk because they were less than a week apart. I did a data vis personal project, and showed some text vis/poetry generation apps. Lots of people said they enjoyed them tons.  Slides here.
  • PyData Warsaw invited keynote - I talked about summarization.  Slides here
  • EMAEE 19: Invited panelist on data vis, I spoke about big data and EDA (exploratory data analysis). Slides here.
  • Micro Macro Mesa Conf in Lyon (invited): I spoke about visualizing and generating poetry with VAE's (variational autoencoders), based on a project by Allison Parrish.  My slides (which need to be written up) are here.
Example generation of poem lines (red) from a VAE using a TSNE layout of training lines as guide.

Reboot of the TinyLetter "Things I Think Are Awesome"

I didn't feel very awesome during a lot of the toxic speech consulting, but I revived the newsletter this fall!  I added a poem, recipes, and tv shows to the latest edition.  It's all about recommendations.  My goal is to keep it positive, short, and tech-arty.  Join here

Current Consulting

I took a month off between gigs and primarily worked on text generation with VAEs.  I've started work again, splitting my time among 3 clients:  Google Arts and Culture in Paris (a possibly short-term contract on data analysis, NLP, and vis of museum assets), writing Python charting tutorials for Flowingdata, and generating poetry for the UK Dubai Pavilion 2020 (working with Kyle McDonald).

Design by Es Devlin (source link)

Next year, I will be a judge and speaker at the data visualization conference Malofiej 28 in March.  Come see me in Pamplona, Spain?  

Happy holidays, and a great 2020 full of inspiring creative tech and datasets to all!