My goals for data mining are to:
- Create a prediction algorithm so that I can predict unknown variables of interest.
- To find unseen patterns in my data.
There are some of the data mining techniques that I can use. This is not a complete or exhaustive list. I am limited to what I know and that is JavaScript. My goal in 2020 is to learn Python after the 2019 Tax season.
- Classification is learning a function that maps
(classifies) a data item into one of several predefined classes.
- Bayesian – uses classifier.js, by Heather Arthur.
- Perceptron – Loosely based on perceptron.js, by John Chesley.
- SVM – uses svm.js, by Andrej Karpathy.
- Linear SVM – svmperf.js and svmlinear.js.
- Decision Tree – node-decision-tree-id3 by Ankit Kuwadekar or ID3-Decision-Tree by Will Kurt.
- Regression is learning a function which maps a
data item to a real-valued prediction variable.
- Simple linear regression and multiple linear regression – Shaman ML library
- Clustering is a common descriptive task where
one seeks to identify a finite set of categories or clusters to describe the
data.
- K-means – Philippe Modard
- Mean-shift – ml.js
- Density-based spatial clustering (DBSCAN) – Lukasz Krawczyk
- Hierarchical (Agglomerative) – ml.js
- Summarization involves methods for finding a
compact description for a subset of data.
- Text summarizer (node-summarizer) – Swapnik Katkoori, James Brook
- Nlpsum – Julien Loutre
- Dependency Modeling consists of finding a model
which describes significant dependencies between variables.
- Dependency graph – Jim Riecken
- Change and Deviation Detection focuses on
discovering the most significant changes in the data from previously measured
or normal values. Simply said it is an outlier detection.
- Standard deviation, DBSCAN, Hidden Markov models, k-nearest neighbor, isolation forest
- ELKI
Reference: “Knowledge Discovery and Data Mining: Towards a Unifying Framework”, Usama Fayyad et al, 1996. https://www.aaai.org/Papers/KDD/1996/KDD96-014.pdf