Independent Variables with Correlation

Sometimes, you might have a dataset wherein you have two independent variables which have some correlation.

Example

You might have two independent variables "is_male_gender" and "is_female_gender" which are simply the inverses of each other.

In this case, you need to remove the correlated independent variables because they are redundancies and will affect how your linear regression will be constructed.

Instead of two separate independent variables- just use one column with 1 or 0.

References

  • Wikipedia definition of statistical correlation

  • Using dummy encoding for variables with more than two possible values

Last updated

Was this helpful?