Input significance analysis: Feature selection through synaptic weights manipulation for EFuNNs classifier
Today’s digital lifestyles are changing rapidly and already moving towards the Big Data phenomenon. The data stored or collected from these digital activities can be so large or complex, and caused the traditional data processing algorithms or software to be inadequate when used to process them. Spe...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Journal of Fundamental and Applied Sciences
2017
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/61240/ http://irep.iium.edu.my/61240/ http://irep.iium.edu.my/61240/ http://irep.iium.edu.my/61240/1/2947-7229-1-PB.pdf |
Summary: | Today’s digital lifestyles are changing rapidly and already moving towards the Big Data phenomenon. The data stored or collected from these digital activities can be so large or complex, and caused the traditional data processing algorithms or software to be inadequate when used to process them. Specifically for the classification process, Big Data can cause the classifiers to process longer than necessary, and the redundant or irrelevant data may misguide the learning classification algorithms to learn the random error or noise related to them. Other than that, the online environment that display, collect, manipulate and store the data can cause the dataset to grow larger, in sync with the online processes. This can pose even greater problems for the classifiers because it means the already massive dataset will be growing in very short time. Therefore, there must be a way to overcome this problem. This work proposed to apply Input Significance Analysis (ISA) to identify redundant and irrelevant data, and hence can speed up the processing time. Additionally, the ISA processes may also resemble variable or feature selection (FS) that refers to the feature irrelevance, in which, when removed, the remaining features will improve, be the measure of, among others, accuracy and consistency. This work is a continuation of the previous work that focused on a process prior to FS, which is Feature Ranking (FR). FR is referring to a process of re-arranging the order of the input data/features according to the importance value obtained from the ISA methods. As in the previous work, this work is particularly interested in ISA methods that can manipulate synaptic weights; namely Connection Weights (CW) and Garson’s Algorithm (GA), and the classifier selected is Evolving Fuzzy Neural Networks (EFuNNs). The goals for this work are, first, to test FS method on a dataset selected from the UCI Machine Learning Repository and executed in an online environment, record the results, and later compared with the results that used original and ranked data from the previous work. This is to identify whether FS can contribute to improved results, and which of the ISA methods mentioned above that work well with FS, i.e. give the best results. Secondly, to attest the FS results by using a differently selected dataset taken from the same source, and in the same environment. There are two groups of experiments conducted to accomplish these goals. The results are promising; when FS is applied, some efficiency and accuracy are noticeable, compared to the original and ranked data. |
---|