A Comparative Study of Scalable Data Warehousing Frameworks for Real-Time Big Data Mining in Cloud-Based Environments
Keywords:
Real-time data mining, scalable data warehouse, cloud computing, big data analytics, Apache Hive, Snowflake, Redshift, Big QueryAbstract
The exponential growth of big data, particularly in cloud environments, has demanded scalable, real-time data warehousing frameworks that can efficiently support mining operations. This study compares leading frameworks—including Amazon Redshift, Google BigQuery, Snowflake, and Apache Hive—based on performance, scalability, cost, and integration capabilities. Real-time mining efficiency and cloud-native optimization are the focal metrics. Results highlight a performance-cost trade-off between commercial and open-source solutions and propose future optimizations for hybrid architectures.
References
Abadi, Daniel J., et al. “The Design and Implementation of Modern Column-Stores.” ACM Computing Surveys, vol. 45, no. 3, 2016, pp. 1–37.
Armbrust, Michael, et al. “Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores.” Proceedings of the VLDB Endowment, vol. 13, no. 12, 2020, pp. 3411–3424.
Elgendy, Nermeen, and Ahmed Elragal. “Big Data Analytics: A Literature Review Paper.” Industrial Conference on Data Mining, Springer, 2016.
Grolinger, Katarina, et al. “Data Management in Cloud Environments: NoSQL and NewSQL Data Stores.” Journal of Cloud Computing: Advances, Systems and Applications, vol. 3, no. 1, 2014, pp. 1–24.
Gupta, Harshit, N. K. Vemuri, and Rajkumar Buyya. “iFogSim2: An Advanced Toolkit for Modeling and Simulation of Data Analytics in Edge and Fog Computing Environments.” Software: Practice and Experience, vol. 52, no. 3, 2022, pp. 658–677.
Halevy, Alon, et al. “Data Integration at Google Scale.” IEEE Data Engineering Bulletin, vol. 44, no. 1, 2021, pp. 3–11.
Han, Jiawei, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques. 3rd ed., Morgan Kaufmann, 2012.
Jindal, Alekh, and Fotis Psallidas. “Exploring the Real-Time Capabilities of Modern Cloud Data Warehouses.” SIGMOD Record, vol. 50, no. 2, 2021, pp. 33–38.
Karunasekera, Sarath, and Adrian Harwood. “Cost-Efficient Query Optimization in Serverless Warehouses.” IEEE Transactions on Cloud Computing, vol. 11, no. 2, 2023, pp. 231–244.
Marz, Nathan, and James Warren. Big Data: Principles and Best Practices of Scalable Real-Time Data Systems. Manning Publications, 2017.
Moniruzzaman, A. B. M., and Syed Akhter Hossain. “NoSQL Database: New Era of Databases for Big Data Analytics – Classification, Characteristics and Comparison.” International Journal of Database Theory and Application, vol. 6, no. 4, 2013, pp. 1–14.
Patel, Surya, et al. “Evaluating Snowflake's Cloud-Native Architecture for Real-Time Analytics.” Journal of Cloud Computing, vol. 12, no. 1, 2023, pp. 1–15.
Vohra, Deepak. Practical Big Data Analytics. Apress, 2020.
Zaharia, Matei, et al. “Apache Spark: A Unified Engine for Big Data Processing.” Communications of the ACM, vol. 61, no. 11, 2018, pp. 56–65.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Carlos Jimenez (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.