Machine Learning Strategies
We all strive to build a data-driven solution that is not just effective but sustainable. In the context of Machine Learning, there are certain strategies that points us to the direction that is promising and safely discard the not-so-helpful ones. Let us understand what they are in this comprehensive article.
Firstly, meet Orthogonalization. For those who do not know orthogonalization, let me give a brief introduction with an example.
We drive cars and understand the three pedals as shown below. Imagine a car with only one pedal that reacts based on the given pressure. We cannot really control the outcome that we want.
Here’s where orthogonalization comes into picture. By default, each pedal has a unique reaction when it is tuned (in this case, applying pressure). When you press the gas pedal hard, you move faster and vice versa for the brake pedal. By definition, orthogonal means perpendicular (at right angle to the opposite direction) i.e., tuning of a particular metric (pedal) does not affect the entire system but affects a certain result only. You might wonder, how does this concept work in Machine Learning models? Let me explain.
When we work on a Machine Learning project, we usually aim for the following
- Performance is better in training set (‘better’ depends on the benchmark of each application)
- Performs well in the development set
- Performs well in the test set
- Performs well in the real world
Let’s say your performance in the training set is not up to the mark. In this case, you need to focus on only those parameters (such as training a bigger network, using different optimization algorithms) to achieve that performance. Similarly, let’s say, your performance is better in training set but not so much in the dev set, you do only those recommendations that affect(improve) the performance in the dev set. That’s the base idea of Orthogonalization. Let’s dive a little deeper and understand more on how we can use Orthogonalization better.
SINGLE NUMBER EVALUATION METRIC
A strategy that really helps when building an algorithm or achieving a specific goal from the algorithm, is to have a single number evaluation metric. A single real number tells you quickly if the tuning/change is effective or not. Let’s say you’re dealing with a classification algorithm and Model A gives better Precision and less Recall while Model B shows vice versa results. As we know there’s a trade-off between Precision and Recall, it is hard to determine which model to choose based on two different metrics. That’s where a single number evaluation metric (F1 score) comes into picture. Now we can finalize on a model based on F1 score without really worrying about two different metrics.
But….. things aren’t this easy in real world, is it? You are bound to face different metrics while building an algorithm that needs to be given enough attention as well. Let me give you an example in which you have built three models —
Model A: 90% F1 score with 80ms running time
Model B: 93% F1 score with 95ms running time
Model C: 97% F1 score with 1500ms running time
Which one are you more likely to choose? Model C is great but runs for long, Model A is fast but not great with F1 score, Model B is in between. Here’s where I’d like to introduce the concept of satisficing and optimizing metrics. Sounds new? Let me explain. Say you have N number of metrics to choose the best model. It should be case that 1 out of the N metrics should be optimizing and (N-1) metrics are okay if they are just satisfying. In this case, F1 score is the metric I choose to be optimizing i.e., I want the highest % possible while keeping the running time metric satisfying. This happens with effective communication with your stakeholders in choosing and assigning the metrics appropriately.
While speaking about metrics, we cannot forget the data distribution across the flow where all of the above mentioned topics are involved. It is barely in focus, but the data distribution across your Training, Dev and Test sets does have an impact in the way your ML application works and progresses. Let’s understand the word ‘distribution’ here for a second. Say you have data about the population of 8 different countries — USA, UK, India, China, Japan, Australia, Brazil, Spain. You take data of the first 4 countries and evaluate on the Dev set but when you come to test it in the test set, you test it for the bottom 4 countries. You know the kind of results you’re going to get. Instead, the data should be randomly shuffled into dev and test sets where you get a little bit of everything and the results are not too absurd or biased.
Now that we know how to ‘distribute’, we also need to know how much to distribute. The rule of thumb that exists while splitting Train and Test set was 70%-30% and incase you had Dev set, it was 60%-20%-20%. This was reasonable for pretty decent sized data sets but times are changing, especially with newer techniques in Deep Learning and rapidly improving infrastructure capabilities. The best bet is to have ‘n’ number of data points in the test set which is just enough to provide high confidence in the overall performance of the system. For example, if you a million records in your data set and if 1% of the data set was good enough to evaluate in the test set, then the split would be 980000 in Train, 10000 in Dev, 10000 in Test.
IMPROVING MODEL PERFORMANCE
The last topic in Machine Learning strategy is to understand how to improve the model performance. These steps are super useful especially when trying to achieve the benchmark or move past the plateau. Bear in mind, these steps are very particular to supervised learning.
- Train bigger model (more samples)
- Train longer i.e., use better optimization algorithms (Eg: Adam’s optimizer)
- Better set of hyperparameters (play around with different values)
- Gather more data
- Improve Regularization (Eg: L2, dropout rates)
A big thank you to Andrew Ng at Deeplearning.ai for curating wonderful learning content. It really enables me to transfer my learning and understanding into these articles and posts.
Connect with me on LinkedIn: https://www.linkedin.com/in/mbharathwaj/