7 Habits of Highly Effective Artificial Intelligence (AI) and Machine Learning (ML) Prototypers

Modern machine learning (ML) practitioners are accustomed to living the agile way: building success incrementally, failing quickly, and going through that “feature extraction > feature selection > modelling > evaluation” loop many, many times. A successful ML solution isn’t luck: it’s all about taking a small step, learning as much as possible from it, and then repeating the steps. In this blog, we’ll consider a set of seven habits that can make the whole process not only faster, but also more enjoyable.

1. Get Baselines ASAP

Many experiments in academia start with baselines: with well-known datasets, and with accuracies achieved by others on that data. Baselines are great, because (a) you know what’s generally possible with the data, and (b) you can benchmark anything that you develop straight away. However, in industry, we are often the first to work with a particular dataset or even on the problem as a whole—so how can we get baselines?

To begin, just extract some simple features, feed the data into a simple model, and get back some performance metrics. The accuracy may not be stunning, but now you can start to make informed decisions about what works and what doesn’t and quickly start to understand the task at hand.

2. Don’t Be Afraid to Tweak the Task as You Go

Business will always want something ambitious—but only you can look at the data, play with it, and decide whether what you’re asked to do is even feasible. Maybe that huge, thirty-class problem can be reduced to just ten classes without losing much value? Maybe instead of predicting the probability of an event, you can do “red-yellow-green”style flagging? A limited model is much better than an inaccurate one, especially if the original task is not even solvable.

3. Always Keep the End User in Mind

It’s easy to become preoccupied with low-level details when all you see every day are confusion matrices, accuracy graphs, and decision trees. But to deliver what users want, try to see the world through their eyes. Would they prefer a slightly more accurate model, or a model with slightly lower accuracy that is less prone to overfitting? If your model has to be trained on client data, how quickly must it train?

These and other questions determine what classifiers you can and can’t use, how much work you have to do, and how long the whole process will take. Also, getting these answers too late might throw you back to square one. It’s a pity to spend weeks training and tweaking neural networks, only to discover that the customer requires an explainable model like a decision tree.

4. Use Proper Metrics Straight Away

Plain old classification accuracy is a very popular metric. Take a validation set, run the model, count the number of correctly classified instances, divide it by the size of the dataset. It’s easy to calculate and easy to explain, but this metric often tells a misleading story. Don’t get us wrong: it works fine if, in your task, false positives and false negatives are equally bad, and if your classes are perfectly balanced. But a good 90% figure can be achieved not just by a clever model, but by a model that accidentally does well on a couple of majority classes and misclassifies anything else.

Another problem is that many issues can conveniently hide behind aggregated accuracy. It’s not unusual to spend weeks tweaking features and models, getting additional data, and performing additional tasks, and to observe no change in classification accuracy. However, a single glance at the confusion matrix might reveal a more complex story: models can have very different performance levels at minority classes. Overall, it’s worth considering using precision, F1, or accuracy, averaged across all classes.

Ideally, you’ll define performance metrics properly from the very beginning. Otherwise, something unexpected might not only ruin weeks of effort, but might also be discovered after the model is deployed. There’s nothing worse than optimistic performance in the development phase followed by mediocre results in production.

5. Use Reproducible Techniques First

Even if the task at hand seems highly specific, there are probably dozens of papers, articles, and blog posts claiming to solve it on the web. Five minutes with a search engine can yield tons of ideas to try straight away—but which ones to try first?

A good idea is to start with those that come with some code. It doesn’t have to be a production-grade library developed by a big company: something well-written that runs is usually good enough. But be cautious about papers that expect you to read a math-heavy description and do all of the coding yourself. It’s not unusual to spend a week or two coding and then to discover that the authors left out some crucial details, which makes accurate (or even inaccurate) reproduction impossible. This can happen even if the paper comes from a reputable researcher or has been cited hundreds of times. It’s usually wiser to leave such algorithms till much later, and first try something more predictable.

6. Use “Classic ML” Techniques First

In the past, feature engineering was an important part of many machine learning tasks—often, the most complicated part. The moment you managed to extract informative features from data was the moment when the task was almost solved: running a simple model on the training data and double-checking it on a validation set was quick and easy. By the end of the process, we couldn’t help but understand the problem and the dataset very well.

Nowadays, a variety of deep-learning libraries will gladly take in any kind of data and give a black-box result. Getting such models is usually quick, and they often seem to exhibit good performance. But such a strategy rarely tells you something useful about the task at hand.

Of course, there are some areas in which a deep learning approach is right for rapid prototyping—for example,  many tasks in computer vision, and some tasks in NLP, assuming you have lots of representative data. But for a typical business task, it’s worth trying the interesting feature engineering yourself, rather than handing it to a machine. Then, use some simple, well-tested techniques, like logistic regression or Support Vector Machines (SVMs), and move to deep learning or neural networks only when necessary.

7. Think Like a Software Developer

We ML engineers often hear that we are not “really” software developers. Sometimes it’s accurate and sometimes it isn’t. What matters is that we can and should borrow clever practices from the software engineering world. Some examples:

  • Unit/Integration/Regression Tests:

If all we did was load a CSV file and push a few buttons, there would be nothing to test. However, we often write our own code to extract features, calculate complex metrics, and combine multiple models. Errors can creep in at any stage. No one wants to be the person who first reports an accuracy of 90%, and next day says it’s actually 40% because of a bug in feature selection code that could have been found by a simple unit test.

If all we did was load a CSV file and push a few buttons, there would be nothing to test. However, we often write our own code to extract features, calculate complex metrics, and combine multiple models. Errors can creep in at any stage. No one wants to be the person who first reports an accuracy of 90%, and next day says it’s actually 40% because of a bug in feature selection code that could have been found by a simple unit test.

  • Performance Optimization:

Nowadays, performance in the ML world is often taken as a given. An average computer can train an SVM on a huge dataset within minutes. It’s also pretty quick even with deep learning: most libraries will gladly use your graphics processing unit (GPU) in an effective, parallel manner. A single experiment is quick and easy, but their number multiplies very quickly once you introduce cross-fold validation, proper parameter tuning, and additional datasets.

A “single-threaded” set of experiments can easily take days just to give you a small table with some initial baseline numbers. But simple “one thread-one fold” parallelization can bring it down to hours, and there might even be a way to bring it down to minutes. We don’t suggest optimizing everything that’s optimizable. But it’s foolish to ignore some simple tricks that are evident once you’re wearing a “multi-threaded thinking” hat.

  • Code Versioning:

Providing a link to a source code repository is more and more of a standard for research papers. However, many in the industry still put everything into, say, a Python notebook and add comments like “Added by Joe Bloggs on 06/12/2019” next to important changes. The usual excuse is that Joe is the only person to work on this code, so what’s the point in proper versioning?

The answer will become evident when he needs to repeat that old experiment from two months back with a bigger or a different dataset. Or when he makes a complex change in many places, realizes that it doesn’t work, and hopes that the Undo function on his text editor actually will restore the previous, unbroken, state of the code. Something very crude, like a set of “2019-12-12”, “2019-12-13”, “2019-12-14” folders is better than nothing. But a proper code versioning system is far better than anything crude.

Next Steps

Now it’s your turn! What other habits and techniques, in your opinion, can make AI/ML prototyping better? Let me know.

To learn how your organization can benefit from AI/ML-based enterprise data management, read this white paper.