A Distributed Training Approach to Scale Deep Learning to Massive Datasets

Authors

  • Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA Author

Keywords:

Distributed Training, Deep Learning

Abstract

As deep learning revolutionizes various fields, the challenge of efficiently training models on massive datasets has become increasingly prominent. Traditional training methods often need help with the computational and memory demands required to process such large volumes of data. This project explores a distributed training approach that leverages multiple computing resources to enhance scalability and efficiency. By partitioning datasets and parallelizing model training across a network of machines, we can significantly reduce training times while maintaining or improving model performance. We delve into crucial techniques such as data and model parallelism, examining their respective advantages and the scenarios in which they excel. Additionally, we address the challenges associated with synchronization, communication overhead, and fault tolerance, providing strategies to mitigate these issues. Our findings demonstrate that distributed training not only accelerates the learning process but also enables the handling of previously infeasible datasets for single-machine training. The insights gained from this research offer valuable contributions to deep learning, facilitating the development of more sophisticated models that can tackle complex problems across diverse domains. Ultimately, this project aims to empower practitioners to harness the full potential of their data, driving innovation and advancements in areas such as natural language processing, computer vision, and beyond.

Downloads

Download data is not yet available.

References

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A.

(2012). Large scale distributed deep networks. Advances in neural information

processing systems, 25.

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., ... & Yu, Y. (2015, August). Petuum: A new platform for distributed machine learning on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1335-1344).

Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX symposium on operating systems design and implementation (OSDI 14) (pp. 571-582).

Tsang, I. W., Kwok, J. T., Cheung, P. M., & Cristianini, N. (2005). Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6(4).

Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., & Taha, K. (2015).Efficient machine learning for big data: A review. Big Data Research, 2(3), 87-93.

Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, April). Fast

bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics (pp. 528-536). PMLR.

Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of big data, 2, 1-21.

Le, Q. V. (2013, May). Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8595-8598). IEEE.

Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409.

Teerapittayanon, S., McDanel, B., & Kung, H. T. (2017, June). Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th

international conference on distributed computing systems (ICDCS) (pp. 328-339). IEEE.

Chen, X. W., & Lin, X. (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514-525.

Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 513-520).

Mnih, A., & Gregor, K. (2014, June). Neural variational inference and learning in belief networks. In International Conference on Machine Learning (pp. 1791-1799). PMLR.

Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1), 1-122.

Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).

Gade, K. R. (2017). Integrations: ETL/ELT, Data Integration Challenges, Integration Patterns. Innovative Computer Sciences Journal, 3(1).

Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.

Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences J

Downloads

Published

03-01-2019

How to Cite

[1]
Sarbaree Mishra, “A Distributed Training Approach to Scale Deep Learning to Massive Datasets”, Distrib Learn Broad Appl Sci Res, vol. 5, Jan. 2019, Accessed: Jan. 13, 2025. [Online]. Available: https://dlabi.org/index.php/journal/article/view/239

Most read articles by the same author(s)

Similar Articles

111-120 of 168

You may also start an advanced similarity search for this article.