Data Optimization Techniques using Bloom Filter in Big Data

Ritu Jain; Mukesh Rawat; Swati Jain

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Optimal Assistive Drive System using Mobile Cloud Computing

Mar

2019

Low Leakage Multi Threshold Level Shifter Design using Sleepy Keeper

June

2013

Service based Model using Context Awareness for Ubiquitous Computing

July

2014

Optimum Performance Bounds of Routing Protocols for VANET through Realistic Fading Channel

July

2015

Reseach Article

Data Optimization Techniques using Bloom Filter in Big Data

by Ritu Jain, Mukesh Rawat, Swati Jain

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 142 - Number 3

Year of Publication: 2016

Authors: Ritu Jain, Mukesh Rawat, Swati Jain

10.5120/ijca2016909715

Ritu Jain, Mukesh Rawat, Swati Jain . Data Optimization Techniques using Bloom Filter in Big Data. International Journal of Computer Applications. 142, 3 ( May 2016), 23-27. DOI=10.5120/ijca2016909715

@article{ 10.5120/ijca2016909715,

author = { Ritu Jain, Mukesh Rawat, Swati Jain },

title = { Data Optimization Techniques using Bloom Filter in Big Data },

journal = { International Journal of Computer Applications },

issue_date = { May 2016 },

volume = { 142 },

number = { 3 },

month = { May },

year = { 2016 },

issn = { 0975-8887 },

pages = { 23-27 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume142/number3/24877-2016909715/ },

doi = { 10.5120/ijca2016909715 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:43:58.010954+05:30

%A Ritu Jain

%A Mukesh Rawat

%A Swati Jain

%T Data Optimization Techniques using Bloom Filter in Big Data

%J International Journal of Computer Applications

%@ 0975-8887

%V 142

%N 3

%P 23-27

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. Traditional computing techniques are not enough to process that much large amount of data. Hadoop is a bunch of technology & have capacity to store large amount of data on Data nodes. Hadoop uses MapReduce algorithm to process and analyze large scale datasets over large clusters. MapReduce is essential for Big Data processing. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. Bloom filter technique is probabilistic data model which is used to make processing of data more efficient. Implementation of this filter with mapper can reduce the amount of data travel. In this paper we implemented Bloom filter in Hadoop architecture. This help to reduce network traffic over network which save bandwidth as well as data storage.

References

A. Pavlo, A. Rasin, S. Madden, M. Stonebraker, D. DeWitt, E. Paulson, L. Shrinivas, and D. J. Abadi . A comparison of approaches to large scale data analysis. Proceed-ings of the 2009 ACM SIGMOD International Conference on Management of data, pages 165-178, year 2009.
Apache Hive. Available at http://hive.apache.org
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, and Alex Rasin. Hadoopdb: An architectural hybrid of mapreduce and dbms technolo-gies for analytical workloads . Proceedings of the VLDB Endowment, Pages 922-933, Vol 2, Issue 1 2009.
B. H. Bloom “Space/time trade-offs in hash coding with allowable errors.” Commun. ACMvol. 13, no. 7, pp. 422426, 1970.
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. . Map-reduce for machine learning on multicore. . NIPS, 2006 pages 281-288 2006.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clus-ters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design & Implementation (OSDI), pages 137-150, 2004.
Helen Sun and Peter Helleri. Oracle Information Architecture: An Architects Guid to Big Data. Oracle, 2012.
Hung chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simplified relational data processing on large clusters . Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029-1040, 2007.
Wenfei Fan, Xibei Jia, Jianzhong Li, and Shuai Ma..Reasoning about Record Matching Rulesi. Proceedings of the VLDB Endowment, volume 2 of PVLDB, pages 407418. 2009.

Index Terms

Computer Science

Information Sciences

Keywords

Big Data Hadoop MapReduce Bloom filter.