This assignment focuses on conducting predictive data mining on data sets and using the results to develop a very simple prototype mini classification tool.
On the dataset, execute any predictive data mining technique such as decision trees, Bayesian classifiers, neural networks, k-nearest neighbors, ensemble learning, linear regression etc. You can implement multiple techniques, one followed by another on the same dataset if you prefer to use this for your analysis. If needed, you should convert the format of your dataset as needed by the respective technique(s). If you have already performed conversion in the assignment on data preprocessing, you can use the converted data here. If you would like to use the results of your descriptive data mining techniques and conduct further analysis with predictive data mining, that is fine as well. You should aim to achieve robustness and generalization in the mining, e.g. by altering seeds in the algorithm. Also, you must modify the concerned parameters, e.g. learning rate and error threshold for neural networks, the value of k for k-nearest neighbors etc. to get good results.
Execute at least 3 different combinations of parameters and present the experimental results accordingly for at least one technique. Observe the experimental results and draw useful conclusions from the data.
Based on the hypothesis obtained by the learning in the predictive data mining technique(s), write a simple program in Python, that uses the learned hypothesis to classify new data, and thus serves as a prototype mini classification tool. If you have used multiple techniques, you can select the one that gives greatest accuracy or the one that is most suitable to your data and domain etc.
The program should communicate with the user to give outputs based on new unseen data. For example, consider that the learned hypothesis is based on a dataset that uses weather data to predict if it is okay to play tennis. This can include various attributes such as “temperature”, “chance-of-rain” and so forth to estimate the target “playing tennis” as being “yes”, “no” or “maybe”. It can be used to manually derive rules such as “if temperature = medium and chance-of-rain = low then play-tennis = yes” which can be coded into the program. Thereafter, when a user inputs new values for parameters such as “temperature”, the program can use these rules to output the classification target and thereby suggest to the user “Yes, you can surely play tennis today” or “No, you should not play tennis today” or “Maybe, it seems okay to play tennis today”. This example can be modified based on your dataset and domain. Show at least 3 different runs of such inputs and outputs based on user interaction. The final goal is to have simple communication with the user based on the learning done via predictive data mining for classifying the target attribute. Hence, please work accordingly based on your respective technique(s) and application.
The support vector machine (SVM) is usually a good estimator in classification of chronic diseases
the decision tree.
Artificial neural networks (ANNs) are also effective