Big Data Architectures and Concepts

  • Audrey Tembo Welo Faculty of Science and Technology, University of Kinshasa, Kinshasa, D.R.Congo
  • Hervé Lubaki Kinzonzi Inspection, Central Bank of Congo, Kinshasa, D.R.Congo
  • Noel Bila Khonde Faculty of Science and Technology, University of Kinshasa, Kinshasa, D.R.Congo
  • Eugène Mbuyi Mukendi Faculty of Science and Technology, University of Kinshasa, Kinshasa, D.R.Congo
Abstract views: 197 , PDF downloads: 216
Keywords: Big Data, Big Data architecture, Hadoop, Hadoop cluster, distributed architectures

Abstract

Nowadays, the processing of big data has become a major preoccupation for businesses, not only for storage and processing but also for operational requirements such as speed, maintaining performance with scalability, reliability, availability, security, and cost control; ultimately enabling them to maximize their profits by using the new possibilities offered by Big Data. In this article, we will explore and exploit the concepts and architectures of Big Data, in particular through the Hadoop open-source framework, and see how it meets the needs set out above, in its cluster structure, its components, its Lambda and Kappa architectures, and so on. We are also going to deploy Hadoop in a virtualized Linux environment, with several nodes, under the Oracle Virtual Box virtualization software, and use the experimental method to compare the processing time of the MapReduce algorithm on two DataSets with successively one, two, and three and four Datanodes, and thus observe the gains in processing time with the increase in the number of nodes in the cluster

References

J. B. N. Penka, S. Mahmoudi, and O. Debauche, "A new Kappa Architecture for IoT Data Management in Smart Farming," in The 18th International Conference on Mobile Systems and Pervasive Computing (MobiSPC), Leuven, Belgium, Aug. 9-12, 2021, Procedia Computer Science, Sep. 2021.

G. K. Kalipe and R. K. Behera, "Big Data Architectures: A Detailed and Application Oriented Analysis," International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 9, Jul. 2019, ISSN: 2278-3075.

J. Lin, "The Lambda and the Kappa," University of Waterloo, Sep./Oct. 2017, IEEE Internet Computing.

Mr. H. Hashem, "Modélisation intégratrice du traitement BigData," Thèse de doctorat, Télécom SudParis, Ecole doctorale STIC, Université Paris-Saclay, Evry, France, Sep. 19, 2016.

A. Gillet, É. Leclercq, and N. Cullot, "Évolution et formalisation de la Lambda Architecture pour des analyses à hautes performances - Application aux données de Twitter," 2021 ISTE OpenScience, Published by ISTE Ltd., London, UK, openscience.fr.

J. Kreps, "Questioning the Lambda Architecture," Jul. 2, 2014, [Online]. Available: https://www.oreilly.com/radar/questioning-the-lambda-architecture/.

B. Kahina and K. Hakim, "Mise en place d’un cluster Hadoop de dix (10) postes avec interface d’exécution de jobs MapReduce à l’Ecole Nationale Supérieure en Science et Technologie de l’Informatique (ENSTI), 2019-2020," Université A/Mira de Bejaia Faculté des Sciences exactes, Département Informatique.

GroupLens, "MovieLens Datasets," [Online]. Available: https://grouplens.org/datasets/movielens/. [accessed 18/06/2023].

City of New York, "NYC TLC Trip Record Data," [Online]. Available: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page. [accessed 25/06/2023].

P. Ducange, M. Fazzolari, and F. Marcelloni, "An overview of recent distributed algorithms for learning fuzzy models in Big Data classification," Journal of Big Data, vol. 7, article 19, 2020, https://doi.org/10.1186/s40537-020-00298-6.

AA. De Mauro, M. Greco, and M. Grimaldi, "What is big data? A consensual definition and a review of key research topics," Computer Science, Published 17 February 2015.

C. Avci, B. Tekinerdogan, and I. N. Athanasiadis, "Software architectures for big data: a systematic literature review," Big Data Analytics, vol. 5, no. 5, 2020, https://doi.org/10.1186/s41044-020-00045-1.

R. F. Babiceanu and R. Seker, "Big Data and Virtualization for manufacturing cyber-physical systems: A survey of the current status and future outlook," Computers in Industry, vol. 81, pp. 128-137, Sep. 2016.

P. Nerzic, "Outils pour le BigData," IUT de Lannion - Dept Informatique - February-March 2019

M. Feick, N. Kleer, and M. Kohn (Eds.), "Fundamentals of Real-Time Data Processing Architectures Lambda and Kappa," in SKILL 2018, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn, 2018.

"Big data et objets connectés," Institut Montaigne, April 2015.

JJ. Lejeune, "Hadoop une plate-forme d’exécution de programmes Map-reduce," October 8, 2013

RR. Moussa, "Apache Hadoop Ecosystem," ZENITH Team Inria Sophia Antipolis DataScale project, February 26, 2015.

R. Herschel and V. M. Miori, "Ethics & Big Data," Technology in Society, vol. 49, pp. 31-36, May 2017.

I. Hadjari, M. Benbachir, and F. Boukhatem, "Big DATA: Conceptions, architectures, fonctionnements et

applications," End-of-study project Master in Industrial Engineering, University of Abou Bakr Belkaid-Tlemcen, 2017.

S. Nethula, "Implementation of the Hadoop MapReduce algorithm on virtualized shared storage systems," MSCS-2016-05, Faculty of Computing, Blekinge Institute of Technology, SE-371 79 Karlskrona, Sweden.

Apache Hadoop, "MapReduce Tutorial," [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html. Last Published: 05/18/2022 13:56:23.

PlumX Metrics

Published
2023-12-29