Data Integration Techniques: Exploring tools and methodologies for harmonizing data across diverse systems and sources

Authors

  • Muneer Ahmed Salamkar Senior Associate at JP Morgan Chase, USA Author
  • Karthik Allam Big Data Infrastructure Engineer, JP Morgan & Chase, USA Author

Keywords:

Data integration, data harmonization, ETL

Abstract

Data integration is critical in modern data management, enabling organizations to harmonize data from disparate sources and systems to support comprehensive analysis and informed decision-making. As businesses increasingly rely on data-driven insights, seamlessly integrating data from various platforms, databases, and applications becomes essential. This process involves the use of a range of tools and methodologies designed to address challenges such as data silos, inconsistencies, and diverse formats. Traditional extract, transform, load (ETL) techniques have evolved, with new approaches like extract, load, transform (ELT) gaining popularity due to their ability to handle larger volumes of data more efficiently. Additionally, data lakes, warehouses, and cloud-based integration platforms are reshaping how organizations store, process, and access integrated data. In this context, data virtualization, API-based integration, and event-driven architectures are pivotal in providing real-time data access and ensuring data consistency across systems.

Moreover, automation and machine learning are increasingly being leveraged to streamline integration processes, enhance data quality, and reduce human intervention. This paper explores various data integration strategies, focusing on their effectiveness in overcoming technical challenges while offering flexibility, scalability, and cost efficiency. By examining the latest tools and methodologies, we highlight how organizations can choose the correct integration approach based on their unique data needs and business objectives, ultimately driving better decision-making and operational performance.

Downloads

Download data is not yet available.

References

Prasser, F., Kohlbacher, O., Mansmann, U., Bauer, B., & Kuhn, K. A. (2018). Data integration for future medicine (DIFUTURE). Methods of information in medicine, 57(S 01), e57-e65.

Misra, B. B., Langefeld, C., Olivier, M., & Cox, L. A. (2019). Integrated omics: tools, advances and future approaches. Journal of molecular endocrinology, 62(1), R21-R45.

Dubrow, J. K., & Tomescu-Dubrow, I. (2016). The rise of cross-national survey

data harmonization in the social sciences: emergence of an interdisciplinary methodological field. Quality & Quantity, 50, 1449-1467.

Goble, C., & Stevens, R. (2008). State of the nation in data integration for bioinformatics. Journal of biomedical informatics, 41(5), 687-693.

Deelen, P., Bonder, M. J., Van Der Velde, K. J., Westra, H. J., Winder, E., Hendriksen, D., ... & Swertz, M. A. (2014). Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration. BMC research notes, 7, 1-4.

Salinas, S. O., & Lemus, A. C. (2017). Data warehouse and big data integration. Int. Journal of Comp. Sci. and Inf. Tech, 9(2), 1-17.

Seligman, L., Mork, P., Halevy, A., Smith, K., Carey, M. J., Chen, K., ... & Burdick, D. (2010, June). Openii: an open source information integration toolkit. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1057-1060).

Yang, H., Li, S., Chen, J., Zhang, X., & Xu, S. (2017). The standardization and harmonization of land cover classification systems towards harmonized datasets:A review. ISPRS International Journal of Geo-Information, 6(5), 154.

Laniak, G. F., Olchin, G., Goodall, J., Voinov, A., Hill, M., Glynn, P., ... & Hughes,

A. (2013). Integrated environmental modeling: a vision and roadmap for the future. Environmental modelling & software, 39, 3-23.

Baars, H., & Kemper, H. G. (2008). Management support with structured and

unstructured data—an integrated business intelligence framework. Information systems management, 25(2), 132-148.

Fischer‐Kowalski, M., Krausmann, F., Giljum, S., Lutter, S., Mayer, A., Bringezu, S., ... & Weisz, H. (2011). Methodology and indicators of economy‐wide material flow accounting: State of the art and reliability across sources. Journal of Industrial Ecology, 15(6), 855-876.

Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).

Khan, R. A., & Quadri, S. M. K. (2012). Business intelligence: an integrated approach. Business Intelligence Journal, 5(1), 64-70.

Keenan, A. B., Jenkins, S. L., Jagodnik, K. M., Koplev, S., He, E., Torre, D., ... & Pillai, A. (2018). The library of integrated network-based cellular signatures NIH program: system-level cataloging of human cells response to perturbations. Cell systems, 6(1), 13-24.

Carletto, C., Zezza, A., & Banerjee, R. (2013). Towards better measurement of household food security: Harmonizing indicators and the role of household surveys. Global food security, 2(1), 30-40.

Halog, A., & Manik, Y. (2011). Advancing integrated systems modelling framework for life cycle sustainability assessment. Sustainability, 3(2), 469-499.

Gade, K. R. (2019). Data Migration Strategies for Large-Scale Projects in the Cloud for Fintech. Innovative Computer Sciences Journal, 5(1).

Gade, K. R. (2017). Integrations: ETL/ELT, Data Integration Challenges, Integration Patterns. Innovative Computer Sciences Journal, 3(1).

Downloads

Published

24-06-2020

How to Cite

[1]
Muneer Ahmed Salamkar and Karthik Allam, “Data Integration Techniques: Exploring tools and methodologies for harmonizing data across diverse systems and sources”, Distrib Learn Broad Appl Sci Res, vol. 6, Jun. 2020, Accessed: Dec. 23, 2024. [Online]. Available: https://dlabi.org/index.php/journal/article/view/246

Most read articles by the same author(s)

1 2 > >> 

Similar Articles

31-40 of 187

You may also start an advanced similarity search for this article.