I’ve recently been doing some preparation for my upcoming MSc in analytics. One piece of prep has been running through the (so far excellent) Machine Learning course from a leading light in the field: Andrew Ng (expect a review once I’m done).
By now, frankly, you probably know a lot more machine learning than many (but not all) Silicon Valley Engineers.
The first few weeks covered the intuition of linear and logistic regression, optimisation algorithms to implement them, and regularisation to generalise them (in Octave/Matlab. At the end of the third week, Professor Ng made a (slightly tongue-in-cheek) point that after completing and understanding the material so far, students of this course know more machine learning than people who do it for a living in Silicon Valley.
Doing machine learning
This got me thinking about some comments made to me recently by a couple of colleagues that I want to address.
Colleague 1 - “just linear regression”
The first comment came from a colleague who stated they wanted to do more machine learning in their work. When asked what work they did currently, they responded that it was often “just linear regression”.
My response that, if this was providing valuable results to their client, it was still of great merit was met with the comment “yeah, but it’s not ‘real’ machine learning.”
Colleague 2 - “not proper…”
The second came after a team of colleagues had been working for some time on a tool for analysis of HR data. The end result was a slick dashboard for use by business analysts to help predict outcomes on a range of metrics in “what if” scenarios.
The “guts” of the tool were a selection of simple but solid regression and classification models. We were discussing how it was being branded and I suggested that they could call it a machine learning solution (marketing like to talk about machine learning). I was told that it wasn’t “proper” machine learning. Again, it was just some “simple regression”.
But what you’re doing IS machine learning!
To both of these colleagues I said that whilst they might not be using some of the more esoteric algorithms or methods they’ve read about, they were definitely using machine learning. (The first topics in Prof. Ng’s class, and indeed in many well-recommended books, are linear and logistic regression).
The work they do might not sound that sexy; but they’re using sensible machine learning techniques to deliver value to their clients. And they should be proud of that.
You analysis is valuable
The lucid Introduction to Statistical Learning describes statistical and machine learning as a “set of tools for modelling and understanding complex data sets”. Sounds an awful lot like what my colleagues were doing. The winners of the Netflix prize never saw their algorithm implemented because “the additional accuracy gains did not seem to justify the engineering effort needed to bring them into a production environment.” My colleagues did see their work implemented, and they continue to do so all the time.
So what my colleagues didn’t realise at the time, was that they already were doing machine learning, they were doing it brilliantly, and it was of huge value to their clients. They were taking complex data, discovering the relationships in it, maybe predicting some future outcomes and using their results to solve problems and deliver value. Their solutions worked; they were fast and easy to understand and they could communicate the results effectively.
It’s important to not only encourage my colleagues but also those who, like them, are early in their careers in data, data science, or analytics and are getting swamped by the hype. Don’t worry about trying to “get started” with machine learning: you’re probably more than half way to doing it already, and your work still has an amazing capacity to add a huge amount of value to the problems you’re working on.