Architecting Scalable Data Engineering Pipelines for Real-Time Processing in High-Dimensional Data Ecosystems

Authors

  • Anete Fossen Norway Author

Keywords:

scalable data pipelines, real-time processing, high-dimensional data, data ecosystems, fault tolerance, data engineering

Abstract

The exponential growth of high-dimensional data has necessitated scalable and efficient real-time data engineering pipelines. This paper explores the architecture and design of such pipelines, focusing on scalability, fault tolerance, and real-time processing capabilities. The study synthesizes prior literature, identifies key trends, and presents insights into optimal practices for managing complex data ecosystems.

References

Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., & Seufert, S. (2018). Automatically tracking metadata and provenance in data science scripts. Proceedings of the 2018 International Conference on Management of Data, 265-278. doi:10.1145/3183713.3190657.

Ramachandran, K. K. (2024). Data science in the 21st century: Evolution, challenges, and future directions. International Journal of Business and Data Analytics (IJBDA), 1(1), 1–13.

Agarwal, R., & Dhar, V. (2014). Big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research, 25(3), 443-448. doi:10.1287/isre.2014.0546.

Halevy, A., Korn, F., Noy, N. F., Olston, C., Polyzotis, N., Roy, S., & Whang, S. E. (2016). Goods: Organizing Google’s datasets. Proceedings of the 2016 International Conference on Management of Data, 795-806. doi:10.1145/2882903.2903730.

Vasudevan, K. (2024). The influence of AI-produced content on improving accessibility in consumer electronics. Indian Journal of Artificial Intelligence and Machine Learning (INDJAIML), 2(1), 1–11.

Abadi, D. J., Boncz, P., Harizopoulos, S., Idreos, S., & Madden, S. (2016). The Beckman report on database research. Communications of the ACM, 59(2), 92-99. doi:10.1145/2845915.

Chebotko, A., Kashlev, A., & Lu, S. (2015). A big data modeling methodology for Apache Cassandra. Proceedings - 2015 IEEE International Congress on Big Data, 238-245. doi:10.1109/BigDataCongress.2015.43.

Vassiliadis, P., & Karagiannis, G. (2018). Conceptual modeling for ETL processes. Journal of Data Semantics, 7(3), 207-224. doi:10.1007/s13740-018-0084-8.

Vinay, S. B. (2024). A comprehensive analysis of artificial intelligence applications in legal research and drafting. International Journal of Artificial Intelligence in Law (IJAIL), 2(1), 1–7.

Hartmann, P. M., Zaki, M., Feldmann, N., & Neely, A. (2016). Capturing value from big data – A taxonomy of data-driven business models used by start-up firms. International Journal of Operations & Production Management, 36(10), 1382-1406. doi:10.1108/IJOPM-02-2014-0098.

Ravat, F., & Teste, O. (2016). Data lineage analysis: A survey. International Journal of Data Warehousing and Mining, 12(4), 46-68. doi:10.4018/IJDWM.2016100104.

Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E. B., & Turaga, D. (2018). Learning feature engineering for classification. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2529-2535. doi:10.24963/ijcai.2018/351.

Janus, A., Nagy, M., & Frydman, C. (2021). Data lineage visualization and data governance: A comparative study. Information Processing & Management, 58(3), 102501. doi:10.1016/j.ipm.2020.102501.

Downloads

Published

2024-03-20