Distributed data warehouses - An alternative approach to highly performant data warehouses
Keywords:
Distributed data warehouse, high performanceAbstract
As organizations increasingly rely on data-driven decision-making, the limitations of traditional data warehouses have become apparent. Distributed data warehouses emerge as a compelling alternative, addressing the challenges of scalability, performance, and flexibility. Unlike conventional systems that often struggle with large data volumes and complex queries, distributed data warehouses leverage a decentralized architecture to distribute data processing across multiple nodes. This approach not only enhances performance by parallelizing query execution but also allows for seamless scaling as data needs grow. Furthermore, distributed data warehouses can efficiently handle diverse data types and sources, making them ideal for organizations dealing with varied datasets in real-time. This flexibility supports advanced analytics and real-time reporting, empowering businesses to respond swiftly to market changes and insights. In addition to performance gains, distributed data warehouses improve resilience by eliminating single points of failure, ensuring data availability even during system outages. This robustness is crucial for maintaining business continuity in today's fast-paced environments. The transition to a distributed model fosters innovation, as organizations can experiment with new technologies and methodologies without overhauling their entire infrastructure. By embracing distributed data warehouses, companies can enhance their analytical capabilities and position themselves for future growth in an increasingly data-centric world. This paper explores the architecture, advantages, and practical implications of adopting distributed data warehouses, providing insights for organizations looking to optimize their data management strategies in a rapidly evolving landscape.
Downloads
References
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., & Saltz, J. (2013, August). Hadoop-GIS: A high-performance spatial data warehousing system over MapReduce. In Proceedings of the VLDB endowment international conference on very large data bases (Vol. 6, No. 11). NIH Public Access.
Inmon, W. H. (2005). Building the data warehouse. John wiley & sons.
Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., ... & Unterbrunner, P. (2016, June). The snowflake elastic data warehouse. In Proceedings of the 2016 International Conference on Management of Data (pp. 215-226).
March, S. T., & Hevner, A. R. (2007). Integrated decision support systems: A datawarehousing perspective. Decision support systems, 43(3), 1031-1043.
Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM Sigmod record, 26(1), 65-74.
Inmon, W. H., Strauss, D., & Neushloss, G. (2010). DW 2.0: The architecture for the next generation of data warehousing. Elsevier.
Kimball, R., & Caserta, J. (2004). The data warehouse ETL toolkit. John Wiley & Sons.
Cooper, B. L., Watson, H. J., Wixom, B. H., & Goodhue, D. L. (2000). Data warehousing supports corporate strategy at First American Corporation. MIS quarterly, 547-567.
Nelson, R. R., Todd, P. A., & Wixom, B. H. (2005). Antecedents of information and system quality: an empirical examination within the context of data warehousing. Journal of management information systems, 21(4), 199-235.
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., ... & Liu, H. (2010, June). Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1013-1020).
Rainardi, V. (2008). Building a data warehouse: with examples in SQL Server. John Wiley & Sons.
Bȩbel, B., Eder, J., Koncilia, C., Morzy, T., & Wrembel, R. (2004, March). Creation and management of versions in multiversion data warehouse. In Proceedings of the 2004 ACM symposium on Applied computing (pp. 717-723).
Krishnan, K. (2013). Data Warehousing in the Age of Big Data. Morgan Kaufmann.
Collier, K. (2012). Agile analytics: A value-driven approach to business intelligence and data warehousing. Addison-Wesley.
Ghezzi, C. (Ed.). (2001). Designing data marts for data warehouses. ACM Transactions on Software Engineering and Methodology (TOSEM), 10(4), 452-483
Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).
Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).
Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.
Gade, K. R. (2017). Integrations: ETL/ELT, Data Integration Challenges, Integration Patterns. Innovative Computer Sciences Journal, 3(1).
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
License Terms
Ownership and Licensing:
Authors of research papers submitted to Distributed Learning and Broad Applications in Scientific Research retain the copyright of their work while granting the journal certain rights. Authors maintain ownership of the copyright and have granted the journal a right of first publication. Simultaneously, authors agree to license their research papers under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
License Permissions:
Under the CC BY-NC-SA 4.0 License, others are permitted to share and adapt the work, as long as proper attribution is given to the authors and acknowledgement is made of the initial publication in the journal. This license allows for the broad dissemination and utilization of research papers.
Additional Distribution Arrangements:
Authors are free to enter into separate contractual arrangements for the non-exclusive distribution of the journal's published version of the work. This may include posting the work to institutional repositories, publishing it in journals or books, or other forms of dissemination. In such cases, authors are requested to acknowledge the initial publication of the work in this journal.
Online Posting:
Authors are encouraged to share their work online, including in institutional repositories, disciplinary repositories, or on their personal websites. This permission applies both prior to and during the submission process to the journal. Online sharing enhances the visibility and accessibility of the research papers.
Responsibility and Liability:
Authors are responsible for ensuring that their research papers do not infringe upon the copyright, privacy, or other rights of any third party. Scientific Research Canada disclaims any liability or responsibility for any copyright infringement or violation of third-party rights in the research papers.
If you have any questions or concerns regarding these license terms, please contact us at editor@dlabi.org.