Independent Variables with Correlation
Sometimes, you might have a dataset wherein you have two independent variables which have some correlation.
Example
You might have two independent variables "is_male_gender" and "is_female_gender" which are simply the inverses of each other.
In this case, you need to remove the correlated independent variables because they are redundancies and will affect how your linear regression will be constructed.
Instead of two separate independent variables- just use one column with 1 or 0.
References
Wikipedia definition of statistical correlation
Using dummy encoding for variables with more than two possible values
Last updated
Was this helpful?