•  
  •  
 

Abstract

The big data phenomenon is currently a challenge to the process of relevant knowledge extraction using classical machine learning technique. This is due to the need for efficient data reduction and new fast-distributed machine learning algorithms for such process on big data. The extensive application of SVM demands efficient methods of constructing the classifier to be suitable for big data and high classification capability. In reality, the efficiency of SVM relies on the efficient derivation of the optimal feature subset and the algorithmic parameters. The grid search optimization method usually presents global optima and high learning accuracy compared to PSO and GA, but its larger computation takes much time. The grid search is more attractive because it can simultaneously take part in the learning of every SVM since they do not rely on each other. A novel parallel implementation of grid optimization using Spark Radoop is proposed in this paper to minimize the great computation load and make it suitable for big data processing issues. A major contribution of this study is a significant reduction in the algorithmic computational time when compared to the serial version of gridSVM, as well as the high classification accuracy compared to the other parallel optimization techniques.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS