Big Data Architectures and Concepts
Abstract
Nowadays, the processing of big data has become a major preoccupation for businesses, not only for storage and processing but also for operational requirements such as speed, maintaining performance with scalability, reliability, availability, security, and cost control; ultimately enabling them to maximize their profits by using the new possibilities offered by Big Data. In this article, we will explore and exploit the concepts and architectures of Big Data, in particular through the Hadoop open-source framework, and see how it meets the needs set out above, in its cluster structure, its components, its Lambda and Kappa architectures, and so on. We are also going to deploy Hadoop in a virtualized Linux environment, with several nodes, under the Oracle Virtual Box virtualization software, and use the experimental method to compare the processing time of the MapReduce algorithm on two DataSets with successively one, two, and three and four Datanodes, and thus observe the gains in processing time with the increase in the number of nodes in the cluster
References
J. B. N. Penka, S. Mahmoudi, and O. Debauche, "A new Kappa Architecture for IoT Data Management in Smart Farming," in The 18th International Conference on Mobile Systems and Pervasive Computing (MobiSPC), Leuven, Belgium, Aug. 9-12, 2021, Procedia Computer Science, Sep. 2021.
G. K. Kalipe and R. K. Behera, "Big Data Architectures: A Detailed and Application Oriented Analysis," International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 9, Jul. 2019, ISSN: 2278-3075.
J. Lin, "The Lambda and the Kappa," University of Waterloo, Sep./Oct. 2017, IEEE Internet Computing.
Mr. H. Hashem, "Modélisation intégratrice du traitement BigData," Thèse de doctorat, Télécom SudParis, Ecole doctorale STIC, Université Paris-Saclay, Evry, France, Sep. 19, 2016.
A. Gillet, É. Leclercq, and N. Cullot, "Évolution et formalisation de la Lambda Architecture pour des analyses à hautes performances - Application aux données de Twitter," 2021 ISTE OpenScience, Published by ISTE Ltd., London, UK, openscience.fr.
J. Kreps, "Questioning the Lambda Architecture," Jul. 2, 2014, [Online]. Available: https://www.oreilly.com/radar/questioning-the-lambda-architecture/.
B. Kahina and K. Hakim, "Mise en place d’un cluster Hadoop de dix (10) postes avec interface d’exécution de jobs MapReduce à l’Ecole Nationale Supérieure en Science et Technologie de l’Informatique (ENSTI), 2019-2020," Université A/Mira de Bejaia Faculté des Sciences exactes, Département Informatique.
GroupLens, "MovieLens Datasets," [Online]. Available: https://grouplens.org/datasets/movielens/. [accessed 18/06/2023].
City of New York, "NYC TLC Trip Record Data," [Online]. Available: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page. [accessed 25/06/2023].
P. Ducange, M. Fazzolari, and F. Marcelloni, "An overview of recent distributed algorithms for learning fuzzy models in Big Data classification," Journal of Big Data, vol. 7, article 19, 2020, https://doi.org/10.1186/s40537-020-00298-6.
AA. De Mauro, M. Greco, and M. Grimaldi, "What is big data? A consensual definition and a review of key research topics," Computer Science, Published 17 February 2015.
C. Avci, B. Tekinerdogan, and I. N. Athanasiadis, "Software architectures for big data: a systematic literature review," Big Data Analytics, vol. 5, no. 5, 2020, https://doi.org/10.1186/s41044-020-00045-1.
R. F. Babiceanu and R. Seker, "Big Data and Virtualization for manufacturing cyber-physical systems: A survey of the current status and future outlook," Computers in Industry, vol. 81, pp. 128-137, Sep. 2016.
P. Nerzic, "Outils pour le BigData," IUT de Lannion - Dept Informatique - February-March 2019
M. Feick, N. Kleer, and M. Kohn (Eds.), "Fundamentals of Real-Time Data Processing Architectures Lambda and Kappa," in SKILL 2018, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn, 2018.
"Big data et objets connectés," Institut Montaigne, April 2015.
JJ. Lejeune, "Hadoop une plate-forme d’exécution de programmes Map-reduce," October 8, 2013
RR. Moussa, "Apache Hadoop Ecosystem," ZENITH Team Inria Sophia Antipolis DataScale project, February 26, 2015.
R. Herschel and V. M. Miori, "Ethics & Big Data," Technology in Society, vol. 49, pp. 31-36, May 2017.
I. Hadjari, M. Benbachir, and F. Boukhatem, "Big DATA: Conceptions, architectures, fonctionnements et
applications," End-of-study project Master in Industrial Engineering, University of Abou Bakr Belkaid-Tlemcen, 2017.
S. Nethula, "Implementation of the Hadoop MapReduce algorithm on virtualized shared storage systems," MSCS-2016-05, Faculty of Computing, Blekinge Institute of Technology, SE-371 79 Karlskrona, Sweden.
Apache Hadoop, "MapReduce Tutorial," [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html. Last Published: 05/18/2022 13:56:23.
Copyright (c) 2023 Audrey Tembo Welo
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).