Part 5: Algorithms
Feature pre-processing is crucial in machine learning and data mining. The quality of the model depends on the data. Feature preprocessing turns raw data into a one that is usable by a machine learning model.
Different types of machine learning models
First, there are two different categories of machine learning models that I considered. There are other categories, and I made a simple division between a tree-based model and a non-tree-based model.
A tree-based model (decision tree, random forest, and gradient boosted decision trees) is a type of supervised learning model that provides higher accuracy and stability while being able to capture complex non-linear relationships.
Non-tree-based models (linear models, nearest neighbors and neural networks) are some examples of non-tree-based models. All other supervised learning models fall into the category of non-tree based models.
Different data types or features
Each data set can contain various data types or features and some of the most common data types are as follows:
- Numerical features – let’s us look this first!
- Categorical and ordinal features
- Date and time
- Text
- Image
Different types of feature pre-processing is required for different data types and various machine learning models.
Moreover, do not forget to apply another sanity test to the feature values after pre-processing.