Scalable Storage and Query Optimization Techniques for Big Data Management in Distributed NoSQL Systems

Authors

  • A.Y. Maigoro Cloud Data Solutions Architect, Nigeria Author

Keywords:

Big Data, NoSQL, Query Optimization, Scalable Storage, Distributed Systems, Data Partitioning, Columnar Storage, Adaptive Indexing, Real-time Analytics

Abstract

The explosion of big data has made NoSQL databases indispensable in supporting distributed, scalable, and high-performance data management. This paper investigates modern storage and query optimization strategies in distributed NoSQL systems. It synthesizes existing literature to identify key developments, limitations, and challenges in the realm of big data systems. Through a focused analysis of storage efficiency, data distribution, indexing models, and real-time query processing techniques, this study proposes a comprehensive architectural view and evaluates emerging strategies including columnar formats, vectorized execution, adaptive indexing, and cost-based optimizations in systems like Cassandra, HBase, and MongoDB. Visual diagrams, performance comparisons, and a structured mind map support the conceptual framework. The paper concludes with future research directions toward autonomous NoSQL systems optimized via machine learning.

References

Chang, Fay, et al. "Bigtable: A Distributed Storage System for Structured Data." OSDI, 2006.

DeCandia, Giuseppe, et al. "Dynamo: Amazon’s Highly Available Key-value Store." SOSP, 2007.

Stonebraker, Michael, et al. "C-Store: A Column-oriented DBMS." VLDB, 2005.

Lakshman, Avinash, and Prashant Malik. "Cassandra: A Decentralized Structured Storage System." ACM SIGOPS, 2010.

Cooper, Brian F., et al. "Benchmarking Cloud Serving Systems with YCSB." SoCC, 2010.

George, Lars. HBase: The Definitive Guide. O’Reilly Media, 2011.

Banker, Kirk. MongoDB in Action. Manning Publications, 2012.

Leavitt, Neal. "Will NoSQL Databases Live Up to Their Promise?" IEEE Computer, 2010.

Zhang, Xiangyao, et al. "Redesigning Memory-centric Storage for Fast Analytics." CIDR, 2017.

Pavlo, Andrew, and Matthew Aslett. "What's Really New with NewSQL?" ACM SIGMOD Record, 2016.

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters." OSDI, 2004.

Grolinger, Katarina, et al. "Data Management in Cloud Environments: NoSQL and NewSQL Data Stores." Journal of Cloud

Computing, 2013.

Yu, Cong, et al. "Observing Query Execution in NoSQL Systems." PVLDB, 2018.

Trautner, Thomas, and Alexander Zeier. "In-memory Data Management for NoSQL Systems." Information Systems, 2016.

Abadi, Daniel J. "Query Execution in Column-Oriented Database Systems." MIT PhD Dissertation, 2008.

Downloads

Published

2024-01-13