Hey Venkat, Thanks for sharing. I understand your point on priors, but i am not fully sure if 'multi-collinearity' is a bigger problem in Bayesian. In the graph you shared from 'Statistical Rethinking' book, the 'std deviation' of posterior distribution increases significantly only after correlation exceeds 0.6 or so. However, in my experience i have never seen correlations between 'features' exceed 0.5 (in worst case). So, i am genuinely curious to know if this is a problem. (or if you have seen correlations exceeding 0.6 in your experience?) Also, what is the right 'standard deviation' that we should aim for ? My intent is just to learn and I would love your perspective on this.
Thank you
Ravi (i work as an Analytics Director in a CPG company)
"In the graph you shared from 'Statistical Rethinking' book, the 'std deviation' of posterior distribution increases significantly only after correlation exceeds 0.6 or so."
Actually this is not an accurate interpretation. In the y axis , we are actually talking about probability density and not just raw numbers. So from probability density perspective, the delta that your are observing for every small increase in correlation is monotonically increasing.
Also the lift in curve starts happening in and around 0.4 itself. This many would consider in the moderate correlation range. Plus from our own internal exercises we have seen HDI inflate a lot even when the correlation is 0.4 or thereabouts or when the VIF is between 1 to 5.
Also bayesian MMM practitioners generally don't check for multicollinearity while specifying the models or they rather think a moderate multicollinearity would be 'ok'. Only to realize that their HDIs are wide. But this realization happens only after the model is fit. Often Bayesian MMMs take hours to run. So this realization might often come too late.
"So, i am genuinely curious to know if this is a problem. (or if you have seen correlations exceeding 0.6 in your experience?) "
Yes we have regularly seen multicollinearity north of VIF 10 easily and the correlations between any two variables also around 0.6-0.8 range.
"Also, what is the right 'standard deviation' that we should aim for ?"
Again from the probability density perspective and with an aim to get accurate and precise estimates, one has to aim for lower HDIs.
Hey Venkat, Thanks for sharing. I understand your point on priors, but i am not fully sure if 'multi-collinearity' is a bigger problem in Bayesian. In the graph you shared from 'Statistical Rethinking' book, the 'std deviation' of posterior distribution increases significantly only after correlation exceeds 0.6 or so. However, in my experience i have never seen correlations between 'features' exceed 0.5 (in worst case). So, i am genuinely curious to know if this is a problem. (or if you have seen correlations exceeding 0.6 in your experience?) Also, what is the right 'standard deviation' that we should aim for ? My intent is just to learn and I would love your perspective on this.
Thank you
Ravi (i work as an Analytics Director in a CPG company)
Hello Ravi,
Thanks for your question. Let me clarify.
"In the graph you shared from 'Statistical Rethinking' book, the 'std deviation' of posterior distribution increases significantly only after correlation exceeds 0.6 or so."
Actually this is not an accurate interpretation. In the y axis , we are actually talking about probability density and not just raw numbers. So from probability density perspective, the delta that your are observing for every small increase in correlation is monotonically increasing.
Also the lift in curve starts happening in and around 0.4 itself. This many would consider in the moderate correlation range. Plus from our own internal exercises we have seen HDI inflate a lot even when the correlation is 0.4 or thereabouts or when the VIF is between 1 to 5.
Also bayesian MMM practitioners generally don't check for multicollinearity while specifying the models or they rather think a moderate multicollinearity would be 'ok'. Only to realize that their HDIs are wide. But this realization happens only after the model is fit. Often Bayesian MMMs take hours to run. So this realization might often come too late.
"So, i am genuinely curious to know if this is a problem. (or if you have seen correlations exceeding 0.6 in your experience?) "
Yes we have regularly seen multicollinearity north of VIF 10 easily and the correlations between any two variables also around 0.6-0.8 range.
"Also, what is the right 'standard deviation' that we should aim for ?"
Again from the probability density perspective and with an aim to get accurate and precise estimates, one has to aim for lower HDIs.
Hope this clarifies.
Regards
Venkat