Are you replacing missing values with Zeroes in your MMM dataset? Please Don't.
Missing value ≠ Zeroes
In my experience of over a decade, I have seen wrong insights from MMM mainly due to two reasons:
- Bad Data.
- Bad Model.
While there are so many calibration metrics to tell you whether your model is accurate or not, there are no such metrics to tell you whether the data that you have received or imputed is accurate.
Hence, when it comes to ascertaining data quality and accuracy, 'domain understanding' is the only reliable metric you have.
The imputation of missing values with zeroes in MMM is a problem that can easily go unnoticed.
The problem with imputing zeroes in place of missing values
A missing value doesn't necessarily mean the actual value was zero. It simply means the data are absent. Treating a missing value as zero can fundamentally change the meaning of the data.
For example, if you're analyzing advertising spend and a particular month has a missing value for a specific channel, it doesn't automatically mean the client spent 'NOTHING' on that channel that month. It could be that the data wasn't recorded, or there was a data collection error.
if you're analyzing advertising spend and a particular month has a missing value for a specific channel, it doesn't automatically mean the client spent 'NOTHING' on that channel that month. It could be that the data wasn't recorded, or there was a data collection error.
Impact on MMM and its accuracy
Replacing missing values with zeroes can artificially inflate the number of zero values in your dataset. This can skew distributions, bias statistical calculations (mean and variance), and lead to inaccurate model coefficients.
If you're trying to understand the factors impacting sales (causality), imputing with zeros could lead you to incorrectly conclude that a variable has no impact when, in reality, the missing data is masking its true effect.
Bottomline:
Don't impute missing values with zeroes.
Imputing missing values with zeroes can distort the true nature of the data, introduce bias, and lead to inaccurate model results and interpretations. It is essential to carefully consider the context, the nature of the missingness (MAR, MCAR, MNAR), and the potential impact on the analysis before deciding how to handle missing data.
Methods like carrying forward values, statistical imputation (used cautiously) or even excluding the variable (if there is a lot of missing values), might be more appropriate depending on the specific situation.
Thanks for reading.
Need help knowing whether you should replace missing data with Zeroes? Reach out to us





