Illustration of One Hot Encoding
- Occasionally you may have to one hot encode or mask certain columns in a given data set in order to optimize accuracy and get the model to work.
- One hot encoding essentially takes 2+ types of values in a column and makes them separate columns.
- Masking simply converts a key word to a number.
How to One Hot Encode?

How to Mask Data?

Standardization and Normalization
When there are multiple inputs, there may be different scales or some inputs may be too large. In this scenario, standardization/normalization is used.
Scaling makes the training of the model less sensitive to the scale of features so coefficients, such as the weights, can be solved for earlier
This also helps to make sure that the cost does not converge and have a massive variance. In other words, simpler numbers helps get the weights with the least cost faster and efficiently.
Helps compare different inputs with different units/scales
When do I use Mean Normalization and when do I use Standardization?
Normalization helps with varying scales and when the algorithm does not make assumptions about the distribution of data
Standardization helps when your data has a bell curve distribution
When do I scale at all
Whenever an algorithm computes cost or weights or/and when the scale of variables are very different, scale your inputs
Standardizations equation involves utilizing Z-score to replace values
X' = (X - mean)/Standard Deviation
The python way to do this would involve:
xStd = (x- np.mean(x,axis = 0))/np.std(x,axis = 0)
The mean is now 0 and the standard deviation is now 1
Normalization will result in a similar output and can be given by the equation
X' = (X - mean(x))/(max(x)-min(x))
They python way to do this would involve:
Xnorm = ((X - np.mean(x,axis=0))/(np.max(x,axis=0) - np.min(x,axis=0)))
Distribution of inputs are now -1 <= x’ <= 1
An alternate scale called min-max scaling can also be used:
X' = (X - min(x))/(max(x)-min(x))
They python way to do this would involve:
Xnorm = ((X - np.min(x, axis = 0))/(np.max(x, axis = 0) - np.min(x, axis = 0)))
Distribution of inputs are now 0 <= x’ <= 1