Machine learning

Part 5: Algorithms

Feature pre-processing is crucial in machine learning and data mining. The quality of the model depends on the data. Feature preprocessing turns raw data into a one that is usable by a machine learning model.

Different types of machine learning models

First, there are two different categories of machine learning models that I considered. There are other categories, and I made a simple division between a tree-based model and a non-tree-based model.

A tree-based model (decision tree, random forest, and gradient boosted decision trees) is a type of supervised learning model that provides higher accuracy and stability while being able to capture complex non-linear relationships.

Non-tree-based models (linear models, nearest neighbors and neural networks) are some examples of non-tree-based models. All other supervised learning models fall into the category of non-tree based models.

Different data types or features

Each data set can contain various data types or features and some of the most common data types are as follows:

  • Numerical features – let’s us look this first!
  • Categorical and ordinal features
  • Date and time
  • Text
  • Image

Different types of feature pre-processing is required for different data types and various machine learning models.

Moreover, do not forget to apply another sanity test to the feature values after pre-processing.

Part 4: Splitting