Top 5 Mistakes Data Scientists Make with Hyperparameter Optimization and How to Prevent Them
Hyperparameter optimization is a powerful tool for unlocking the maximum potential of your model, but only when correctly implemented. Here, I’ll share five common problems I’ve seen data scientists encounter while executing hyperparameter optimization.
Problem #1: Trusting the Defaults
By not explicitly setting any hyperparameters on your model you are implicitly relying on the model developer’s default hyperparameters — and these values may be completely inappropriate for your problem.
The biggest mistake in hyperparameter optimization is not performing hyperparameter optimization at all
Solution: Execute hyperparameter optimization
A co-worker at SigOpt built a model to simulate NBA bets in Vegas. Using the default hyperparameters the model lost money betting on games, but with tuned hyperparameters, the model won money in its simulated bets! Hyperparameter optimization was the difference between winning money and losing money.
If you’re not performing hyperparameter optimization, you need to start now
Problem #2: Using the Wrong Metric
Microsoft needed a measure of quality for their search engine’s results. After optimizing Bing’s algorithms to maximize searches per session, they found that while their metrics improved, the quality of their search results degraded. Why? The underlying assumption that more searches per session meant a better algorithm was false — the number of searches went up because users could not find what they were looking for!
Stories like this illustrate a large issue when performing hyperparameter optimization; it is designed to amplify the evaluation criterion that you, the practitioner, have chosen.
Garbage in isn’t just garbage out. It’s amplified garbage out (which is way more smelly…)
If you have incorrect underlying assumptions about your metric, hyperparameter optimization can amplify those incorrect underlying assumptions.
Solution: Balance multiple metrics
Your model evaluation can balance multiple, competing metrics, such as revenue and quality. One option is to build a scalar-valued composite of competing metrics, a second option is to explore the space of all possible solutions through a multi-metric approach.
Use all of your options to craft a better metric
Problem #3: Overfitting
Your model is overfitting when it performs extremely well during training and optimization, and very poorly out of sample.
Too many models seem perfect during dress rehearsal but flop on opening night
Solution: Cross validation
Avoid overfitting by using techniques such as cross validation, backtesting, or regularization. When using a technique like k-fold cross validation, your new metric would be the average of k metrics from each of the k folds of your data. Techniques like k-fold cross validation will help ensure that the metric you optimize for generalizes well to unseen data.
You don’t want the best model for your training data, you want the best model for the real world
Problem #4: Too Few Hyperparameters
Real world machine learning pipelines cover everything from raw data to feature extraction to model building. Often, feature extraction will involve tunable parameters like transformations or learned feature representations.
Don’t forget: optimize your feature parameters as well to get maximum performance
Solution: Tune model and feature parameters
At SigOpt, we built an XGBoost classifier for SVHN digits, and we showed that tuning your feature parameters at the same time as your model hyperparameters produced better results than tuning the two separately. We recommend that you optimize all hyperparameters of your model, including architecture parameters and model parameters, at the same time.
Tune both your model hyperparameters and your feature parameters
Problem #5: Hand-tuning
An optimization method is the strategy by which the next set of hyperparameters are suggested during hyperparameter optimization. There are many different optimization methods to choose from, and they will all have different setup steps, time requirements, and performance outcomes.
When you manually tweak the values of your hyperparameters, you are the optimization method. And you are an inefficient optimization strategy (it’s math, not personal…)
Solution: Algorithmic optimization
At the end of the day, humans are usually poor at performing high dimensional, non-convex optimization in their heads. Algorithmic optimization can beat out hand tuning for a deep neural net in a number of hours, only requiring knowledge of a bounding box for the hyperparameters representing algorithmic weights or the structure of a neural net.
Choosing an algorithmic optimization method will save you time and help you achieve better performance
Here at SigOpt, we are partial to Bayesian Optimization. SigOpt’s Bayesian Optimization service tunes hyperparameters in machine learning pipelines at Fortune 500 companies and top research labs around the world. Contact us today to learn more.
Author’s Note: Thanks to Gustaf Cavanaugh for editing my earlier Common Problems in Hyperparameter Optimization into two more easily digestible blog posts (and for the great color commentary in many of the pull quotes!). I appreciate his effort to help me explain technical concepts in the simplest, most engaging way possible.