Data Lakes vs Data Warehouses: What's Right for Your Business?

Authors

  • Naresh Dulam Vice President Sr Lead Software Engineer, JP Morgan Chase, USA Author

Keywords:

Data integration, data architecture, big data processing

Abstract

As businesses face the growing challenge of managing vast amounts of data, efficient storage and analysis systems have become more critical. Two of the most prominent solutions in this space are data lakes and data warehouses, each offering distinct features that cater to different business needs. Data lakes are designed to store raw, unstructured, and semi-structured data, making them ideal for businesses with large volumes of diverse data types such as logs, social media feeds, and sensor data. They offer scalability and flexibility, allowing organizations to store data upfront without conforming to rigid structures. On the other hand, data warehouses are optimized for structured data and are typically used for business intelligence and reporting purposes, where data consistency and speed are paramount. These systems require a more rigid schema, ensuring data is cleaned, organized, and ready for analytical processing. While data lakes provide greater flexibility and lower upfront costs, they can also present challenges in data quality and accessibility due to the unstructured nature of the stored data. In contrast, data warehouses offer high performance for complex queries and structured data but may need help with scalability when dealing with massive amounts of unstructured data. Choosing between a data lake and a data warehouse depends on a company's specific needs, such as the volume, variety, and velocity of the data they work with and their analytical goals. This article explores both systems' key differences, benefits, and drawbacks, providing businesses with insights to help them decide which data storage solution aligns best with their operational needs and long-term objectives.

Downloads

Download data is not yet available.

References

Stein, B., & Morrison, A. (2014). The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration, 1(1-9), 18.

Terrizzano, I. G., Schwarz, P. M., Roth, M., & Colino, J. E. (2015, January). Data Wrangling: The Challenging Yourney from the Wild to the Lake. In CIDR.

Mohanty, S., Jagadeesh, M., & Srivatsa, H. (2013). Big data imperatives: Enterprise ‘Big Data’warehouse,‘BI’implementations and analytics. Apress.

Vaisman, A., & Zimányi, E. (2014). Data warehouse systems. Data-Centric Systems and Applications, 9.

Collier, K. (2012). Agile analytics: A value-driven approach to business intelligence and data warehousing. Addison-Wesley.

Fang, H. (2015, June). Managing data lakes in big data era: What's a data lake and why has it became popular in data management ecosystem. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER) (pp. 820-824). IEEE.

O'Leary, D. E. (2014). Embedding AI and crowdsourcing in the big data lake. IEEE Intelligent Systems, 29(5), 70-73.

Dyché, J. (2000). e-Data: Turning data into information with data warehousing. Addison-Wesley Professional.

Davenport, T. H., & Dyché, J. (2013). Big data in big companies. International Institute for Analytics, 3(1-31).

Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., & Srinivasan, V. (2015, May). Amazon redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1917-1923).

Watson, H. J. (2002). Recent developments in data warehousing. Communications of the Association for Information Systems, 8(1), 1.

Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., ... & Liu, H. (2010, June). Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1013-1020).

Krishnan, K. (2013). Data Warehousing in the Age of Big Data. Morgan Kaufmann.

Roski, J., Bo-Linn, G. W., & Andrews, T. A. (2014). Creating value in health care through big data: opportunities and policy implications. Health affairs, 33(7), 1115-1122.

Phillips-Wren, G., Iyer, L. S., Kulkarni, U., & Ariyachandra, T. (2015). Business analytics in the context of big data: A roadmap for research. Communications of the Association for Information Systems, 37(1), 23.

Downloads

Published

10-11-2016

How to Cite

[1]
Naresh Dulam, “Data Lakes vs Data Warehouses: What’s Right for Your Business?”, Distrib Learn Broad Appl Sci Res, vol. 2, pp. 71–94, Nov. 2016, Accessed: Dec. 23, 2024. [Online]. Available: https://dlabi.org/index.php/journal/article/view/220

Most read articles by the same author(s)

1 2 > >> 

Similar Articles

1-10 of 198

You may also start an advanced similarity search for this article.