DATA CLEANING
Illustration of One Hot Encoding
 Occasionally you may have to one hot encode or mask certain columns in a given data set in order to optimize accuracy and get the model to work.
 One hot encoding essentially takes 2+ types of values in a column and makes them separate columns.
 Masking simply converts a key word to a number.
How to One Hot Encode?
How to Mask Data?
Standardization and Normalization

Conceptual

When there are multiple inputs, there may be different scales or some inputs may be too large. In this scenario, standardization/normalization is used.

Scaling makes the training of the model less sensitive to the scale of features so coefficients, such as the weights, can be solved for earlier

This also helps to make sure that the cost does not converge and have a massive variance. In other words, simpler numbers helps get the weights with the least cost faster and efficiently.

Helps compare different inputs with different units/scales


When do I use Mean Normalization and when do I use Standardization?

Normalization helps with varying scales and when the algorithm does not make assumptions about the distribution of data

Standardization helps when your data has a bell curve distribution


When do I scale at all

Whenever an algorithm computes cost or weights or/and when the scale of variables are very different, scale your inputs


Equations

Standardizations equation involves utilizing Zscore to replace values

X' = (X  mean)/Standard Deviation

The python way to do this would involve:

xStd = (x np.mean(x,axis = 0))/np.std(x,axis = 0)


The mean is now 0 and the standard deviation is now 1


Normalization will result in a similar output and can be given by the equation

X' = (X  mean(x))/(max(x)min(x))

They python way to do this would involve:

Xnorm = ((X  np.mean(x,axis=0))/(np.max(x,axis=0)  np.min(x,axis=0)))


Distribution of inputs are now 1 <= x’ <= 1


An alternate scale called minmax scaling can also be used:

X' = (X  min(x))/(max(x)min(x))

They python way to do this would involve:

Xnorm = ((X  np.min(x, axis = 0))/(np.max(x, axis = 0)  np.min(x, axis = 0)))


Distribution of inputs are now 0 <= x’ <= 1

