Online ISSN: 2515-8260

D3O: A Framework for Distributed Distancebased Detection of Outliers in Large Data Sets

Main Article Content

1K. Ashesh, 2Dr.G. AppaRao,

Abstract

Abstract: Data comes from diversified sources in a distributed computing environment. Outlier detection in such environment is challenging as it involves a strategy to mine outliers. Parallel processing of data available in multiple sources can provide outliers in short span of time. In fact speed with which outlier are mined and interpreted to make well informed decisions is very important in many real world applications like disease outburst detection in healthcare domain. Towards this end, in this paper, we proposed a framework known as Distributed Distance-based Detection of Outliers (D30). The framework guides the process of discovering outliers from large data sets. An algorithm named Distributed Outlier Detection (DOD) is proposed to achieve this. The algorithm exploits the notion of outlier detection solving set to have effective detection of outliers. Two synthetic datasets known as G2d and G3d and a real dataset from NASA named 2Mass are used to evaluate the proposed algorithm. We built a prototype application to demonstrate proof of the concept. The empirical results revealed that the proposed algorithm is capable of finding outliers effectively. The algorithm showed better performance when compared with other state of the art outlier detection algorithm that employs distributed approach in mining outliers.

Article Details