My First Conference Talk

Last week I gave my first talk at a work related conference. It went better than I had expected it to and found to be a thing worth doing.

I'm not a "speaker" and hadn't heard of the conference (Datapalooza) until my good friend Dan Lynn brought it up while we were hanging at a metal show. He speaks a lot and is more plugged into the scene than I and he asked if I'd pair up with him for a talk. About data cleaning. Exciting.

I've always been hesitant to speak for a variety of reasons.

  • I don't know what I'm talking about (or doing)
  • I don't have much experience with the trendy data stuff
  • It takes a bunch of time to prepare

After going through the process, I think these are mostly BS.

When it comes to what I do day in and day out, and have done in various capcities for over a decade, I kinda know what I'm talking about. I know how to do data at a fast growing ecommerce company. I know how data science and marketing can work together. I know how to deal with medium data. I have a year or 2 under my belt doing data engineering and even some devops stuff.

I don't have a lot of experience with Big Data and the related tech. But I don't have big data most of the time, I have small and medium data. But cleaning your data has to be done regardless of size. Everyone has clickstream data. And everyone wants to talk to their customers better. Doesn't matter if you do that with Hadoop/Spark/Storm or postgres/pandas/scikit-learn.

And the talk didn't take as much time to write as I feared. It helped to have Dan around. He's good at caring about what the slides look like, fonts, all that stuff that I mostly don't notice. And once we started riffing off each other, the talk wrote itself.

The format and topic choice was important for the last point. We picked data cleaning and ETL/pipelining. Two large topics but we did it as a survey. A litany of problems that have tripped us up over the years and how we solved them. They'll trip everyone up. Data science in real life, the stuff we do every day. Stuff you don't learn in the MOOC or the 12 week class. Not sentiment analysis on the twitter stream with framework of the day. Not neural nets to make cat sounds. Not putting a flask app in front of your model. That stuff is all cool and maybe you do that at work but all the upstream work happens first. And more often. Data cleaning gets joked about all the time. We spend a ton of time doing it. We're data janitors. How much time do we spend thinking about it and getting better at it? And without a pipeline, your work doesn't get the customer and the value is limited.

The talk itself went surprisingly well. I had a crazy day up to the minute I walked to the podium. The flagship of my team, a recommendation engine, had been struggilng all day after a database upgrade. It was a bit crazy, I'll have to check if I can write about that here sometime. It was frantic and I was late to meeting Dan prior to the talk. Once we got up there, it was good. We play in a band together and I'm comfortable on a stage. Turns out the different environments tranlate better than I thought.

I'd do it again.

Dan's write up on the talk is here.

And you can see the slides here: Diry Data? Clean it up!

Comments !