Dummy Variable Trap in Machine Learning

Jagroop Singh - Oct 31 '23 - - Dev Community

Before we discuss Dummy Variable Trap, let's first go over what Dummy Variables are ?

As we all know, computers only understand 0's and 1's. So, what happens if we have data with categorical values, such as India and the United States?

Yes, the machine can understand that, but it will have an impact on the accuracy of our machine learning model. So why not convert that data into numerical data by numerically representing common values?

Thus,

The processing of encoding Categorical Data into Numerical Data is known as Dummy Variables.

For example :

Categorical to Numerical Format Conversion

As it's clearly shown in above representation that Dummy Variable is a Binary variable that takes values of 0's and 1's

Dummy Variable Trap :

Dummy Variable trap is a scenario where one or more than one dummy variables that are created are highly correlated(multi-collinear).

We have established four dummy variables, each of which has multicollinearity, as seen in the diagram above. This means that it will have an impact on the machine learning model's performance.

Multicollinearity is nothing but a statistical term where several Independent variables are correlated.

How to handle Dummy Variable Trap ?

Frameworks or built-in algorithms have already dealt with the Dummy Variable traps, so we don't need to do anything else.

But still If we want to do so, we can remove one of the columns something like :

Had

We can reduce multicollinearity and hence the dummy variable trap by eliminating a single column from the Dummy variables. But we no longer have to manage it ourselves; speedy and efficient machine learning algorithms do it beneath the hood.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .