A Distributed Training Approach to Scale Deep Learning to Massive Datasets

Sarbaree Mishra

A Distributed Training Approach to Scale Deep Learning to Massive Datasets

Authors

Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA Author

Keywords:

Distributed Training, Deep Learning

Abstract

As deep learning revolutionizes various fields, the challenge of efficiently training models on massive datasets has become increasingly prominent. Traditional training methods often need help with the computational and memory demands required to process such large volumes of data. This project explores a distributed training approach that leverages multiple computing resources to enhance scalability and efficiency. By partitioning datasets and parallelizing model training across a network of machines, we can significantly reduce training times while maintaining or improving model performance. We delve into crucial techniques such as data and model parallelism, examining their respective advantages and the scenarios in which they excel. Additionally, we address the challenges associated with synchronization, communication overhead, and fault tolerance, providing strategies to mitigate these issues. Our findings demonstrate that distributed training not only accelerates the learning process but also enables the handling of previously infeasible datasets for single-machine training. The insights gained from this research offer valuable contributions to deep learning, facilitating the development of more sophisticated models that can tackle complex problems across diverse domains. Ultimately, this project aims to empower practitioners to harness the full potential of their data, driving innovation and advancements in areas such as natural language processing, computer vision, and beyond.

Downloads

Download data is not yet available.

References

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A.

(2012). Large scale distributed deep networks. Advances in neural information

processing systems, 25.

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., ... & Yu, Y. (2015, August). Petuum: A new platform for distributed machine learning on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1335-1344).

Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX symposium on operating systems design and implementation (OSDI 14) (pp. 571-582).

Tsang, I. W., Kwok, J. T., Cheung, P. M., & Cristianini, N. (2005). Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6(4).

Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., & Taha, K. (2015).Efficient machine learning for big data: A review. Big Data Research, 2(3), 87-93.

Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, April). Fast

bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics (pp. 528-536). PMLR.

Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of big data, 2, 1-21.

Le, Q. V. (2013, May). Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8595-8598). IEEE.

Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409.

Teerapittayanon, S., McDanel, B., & Kung, H. T. (2017, June). Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th

international conference on distributed computing systems (ICDCS) (pp. 328-339). IEEE.

Chen, X. W., & Lin, X. (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514-525.

Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 513-520).

Mnih, A., & Gregor, K. (2014, June). Neural variational inference and learning in belief networks. In International Conference on Machine Learning (pp. 1791-1799). PMLR.

Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1), 1-122.

Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).

Gade, K. R. (2017). Integrations: ETL/ELT, Data Integration Challenges, Integration Patterns. Innovative Computer Sciences Journal, 3(1).

Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.

Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences J

Downloads

Published

03-01-2019

Issue

Vol. 5 (2019): Distributed Learning and Broad Applications in Scientific Research

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

License Terms

Ownership and Licensing:

Authors of research papers submitted to Distributed Learning and Broad Applications in Scientific Research retain the copyright of their work while granting the journal certain rights. Authors maintain ownership of the copyright and have granted the journal a right of first publication. Simultaneously, authors agree to license their research papers under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

License Permissions:

Under the CC BY-NC-SA 4.0 License, others are permitted to share and adapt the work, as long as proper attribution is given to the authors and acknowledgement is made of the initial publication in the journal. This license allows for the broad dissemination and utilization of research papers.

Additional Distribution Arrangements:

Authors are free to enter into separate contractual arrangements for the non-exclusive distribution of the journal's published version of the work. This may include posting the work to institutional repositories, publishing it in journals or books, or other forms of dissemination. In such cases, authors are requested to acknowledge the initial publication of the work in this journal.

Online Posting:

Authors are encouraged to share their work online, including in institutional repositories, disciplinary repositories, or on their personal websites. This permission applies both prior to and during the submission process to the journal. Online sharing enhances the visibility and accessibility of the research papers.

Responsibility and Liability:

Authors are responsible for ensuring that their research papers do not infringe upon the copyright, privacy, or other rights of any third party. Scientific Research Canada disclaims any liability or responsibility for any copyright infringement or violation of third-party rights in the research papers.

If you have any questions or concerns regarding these license terms, please contact us at editor@dlabi.org.

Most read articles by the same author(s)

Sarbaree Mishra, Moving data warehousing and analytics to the cloud to improve scalability, performance and cost-efficiency , Distributed Learning and Broad Applications in Scientific Research: Vol. 6 (2020): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, Vineela Komandla, Srikanth Bandi, Sairamesh Konidala, Jeevan Manda, Training AI models on sensitive data - the Federated Learning approach , Distributed Learning and Broad Applications in Scientific Research: Vol. 6 (2020): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, Sairamesh Konidala, Jeevan Manda, Improving the ETL process through declarative transformation languages , Distributed Learning and Broad Applications in Scientific Research: Vol. 5 (2019): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, Automating the data integration and ETL pipelines through machine learning to handle massive datasets in the Enterprise , Distributed Learning and Broad Applications in Scientific Research: Vol. 6 (2020): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, Vineela Komandla, Srikanth Bandi, Jeevan Manda, Training models for the enterprise - A privacy preserving approach , Distributed Learning and Broad Applications in Scientific Research: Vol. 5 (2019): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, A novel weight normalization technique to improve Generative Adversarial Network training , Distributed Learning and Broad Applications in Scientific Research: Vol. 5 (2019): Distributed Learning and Broad Applications in Scientific Research
Sarbaree Mishra, Distributed data warehouses - An alternative approach to highly performant data warehouses , Distributed Learning and Broad Applications in Scientific Research: Vol. 5 (2019): Distributed Learning and Broad Applications in Scientific Research

A Distributed Training Approach to Scale Deep Learning to Massive Datasets

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

License Terms

Ownership and Licensing:

License Permissions:

Additional Distribution Arrangements:

Online Posting:

Responsibility and Liability:

Most read articles by the same author(s)

Similar Articles

Journal Snapshot

Make a Submission

Invitation for Submissions