Data Mining: Classification & Clustering
Classification using Decision Tree, KNN, SVM, & Random Forest with PCA for Maternal Health Risk Dataset and Parkinson’s Disease.
Clustering using Lloyd’sk-means, FuzzyC-means, and DBSCAN on a dataset of points on a 2d plane. The Elbow method used to find the optimal number of clusters. (Github)
Classification: Maternal Health Risk
Dataset Information
The dataset used in this project contains information related to maternal health risks, including features such as age, blood pressure, blood sugar, body temperature, and heart rate. The target variable is the risk level, which has been encoded numerically.
Summary of Results
RandomForest: 84%
KNeighbors: 85%
SVM: 68%
DecisionTree: 87%
Classification: Parkinson’s Disease Detection
Dataset Information
In this project, we use a dataset containing various voice features to build and evaluate machine learning models for PD classification. This Dataset contains 754 different features.
Summary of Results
For Different feature extraction parameter for PCA algorithm, here is the result:
Clustering: 2D data points
Here is DBSCAN result. See notebook for more detail and other results.