Data Lakes: Building Flexible Architectures for Big Data Storage

Authors

  • Naresh Dulam Vice President Sr Lead Software Engineer, JP Morgan Chase, USA Author

Keywords:

Data Lake, Unstructured Data, Scalability

Abstract

Data lakes are emerging as a powerful solution for managing big data's growing volume, variety, and velocity. Unlike traditional data storage systems, data lakes provide a flexible and scalable architecture capable of storing vast amounts of structured, semi-structured, and unstructured data. This approach allows organizations to store data in its raw form, providing a more agile environment for data exploration, analytics, and machine learning. Data lakes support modern big data technologies, enabling organizations to leverage real-time data processing and gain deeper insights from diverse data sources. The architecture of a data lake is designed to accommodate the complexity of big data workloads, providing the flexibility to integrate with various data management tools, analytics platforms, and cloud-based services. However, with the potential benefits come challenges, particularly around data governance, security, and ensuring data quality. In this context, effective data management practices are essential to avoid data silos and ensure that data lakes deliver on their promise of transforming business intelligence. This paper explores the fundamental principles and best practices for building data lakes, highlighting how they can be optimized for ample data storage and how organizations can successfully navigate the challenges associated with their implementation. By providing an efficient framework for data management and analysis, data lakes are helping organizations unlock the full potential of their big data, enabling more intelligent decision-making and fostering innovation across industries.

Downloads

Download data is not yet available.

References

Mohanty, S., Jagadeesh, M., & Srivatsa, H. (2013). Big data imperatives:

Enterprise ‘Big Data’warehouse,‘BI’implementations and analytics. Apress.

Pokorný, J. (2006). Database architectures: Current trends and their relationships to environmental data management. Environmental Modelling &

Software, 21(11), 1579-1586.

Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., ... & Liu, H. (2010, June). Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1013-1020).

Krafzig, D., Banke, K., & Slama, D. (2005). Enterprise SOA: service-oriented

architecture best practices. Prentice Hall Professional.

Cheng, Y., Qin, C., & Rusu, F. (2012, May). GLADE: big data analytics made easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 697-700).

Stankovski, V., Swain, M., Kravtsov, V., Niessen, T., Wegener, D., Kindermann, J., & Dubitzky, W. (2008). Grid-enabling data mining applications with DataminingGrid: An architectural perspective. Future Generation Computer Systems, 24(4), 259-279.

Bollier, D., & Firestone, C. M. (2010). The promise and peril of big data (pp. 1-

. Washington, DC: Aspen Institute, Communications and Society Program.

Anwer, M. B., & Feamster, N. (2009, August). Building a fast, virtualized data plane with programmable hardware. In Proceedings of the 1st ACM workshop on

Virtualized infrastructure systems and architectures (pp. 1-8).

Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., ... & Hanrahan, P. (2008). Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics (TOG), 27(3), 1-15.

Frehner, M., & Brändli, M. (2006). Virtual database: Spatial analysis in a Web-based data management system for distributed ecological data. Environmental Modelling & Software, 21(11), 1544-1554.

Bieberstein, N. (2006). Service-oriented architecture compass: business value, planning, and enterprise roadmap. FT Press.

You, L. L., Pollack, K. T., & Long, D. D. (2005, April). Deep Store: An archival storage system architecture. In 21st International Conference on Data Engineering (ICDE'05) (pp. 804-815). IEEE.

Sanchez, D., Yoo, R. M., & Kozyrakis, C. (2010). Flexible architectural support for fine-grain scheduling. ACM SIGARCH Computer Architecture News, 38(1), 311-

Delicato, F. C., Pires, P. F., Pinnez, L., Fernando, L., & Da Costa, L. F. R. (2003,

May). A flexible web service based architecture for wireless sensor networks.

In 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings. (pp. 730-735). IEEE.

Dean, J. (2009). Designs, lessons and advice from building large distributed

systems. Keynote from LADIS, 1.

Downloads

Published

02-10-2015

How to Cite

[1]
Naresh Dulam, “Data Lakes: Building Flexible Architectures for Big Data Storage”, Distrib Learn Broad Appl Sci Res, vol. 1, pp. 95–114, Oct. 2015, Accessed: Jan. 22, 2025. [Online]. Available: https://dlabi.org/index.php/journal/article/view/213

Most read articles by the same author(s)

1 2 > >> 

Similar Articles

1-10 of 210

You may also start an advanced similarity search for this article.