Leveraging AI for Proactive Fault Detection in Amazon EKS Clusters
Keywords:
Amazon EKS, KubernetesAbstract
In cloud-native environments like Amazon EKS, ensuring high availability and minimizing downtime are critical to maintaining application performance and user satisfaction. This paper proposes a machine learning-based approach to proactively detect and prevent faults within Amazon Elastic Kubernetes Service (EKS) clusters. The model aims to identify early signs of issues that could lead to service disruption by monitoring key metrics such as pod performance, node health, and network conditions. The system leverages historical performance data to train predictive models, which can anticipate faults before they escalate into critical problems. The model provides real-time alerts and automated remediation strategies by analyzing patterns in resource utilization, system errors, and network latency. This proactive fault detection approach enhances the reliability and stability of EKS clusters and helps reduce operational overhead by allowing teams to address issues before they affect end-users. Through this research, the goal is to demonstrate the potential of integrating AI and machine learning into the operational workflows of Kubernetes-based environments, thus improving both performance and resilience.
Downloads
References
Ambati, P., & Irwin, D. (2019). Optimizing the cost of executing mixed interactive and batch workloads on transient vms. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2), 1-24.
Chelliah, P. R., Naithani, S., & Singh, S. (2018). Practical Site Reliability Engineering: Automate the process of designing, developing, and delivering highly reliable apps and services with SRE. Packt Publishing Ltd.
Mena, J. (1999). Data mining your website. Digital Press.
Jugovac, M. (2019). Designing and evaluating recommender systems with the user in the loop.
Lerche, L. (2016). Using implicit feedback for recommender systems: characteristics, applications, and challenges.
Erdilek, M. (2002). A Research On Electronic Business: Comparison of Electronic Business Models (Master's thesis, Marmara Universitesi (Turkey)).
Kietzmann, J., Paschen, J., & Treen, E. (2018). Artificial intelligence in advertising: How marketers can leverage artificial intelligence along the consumer journey. Journal of Advertising Research, 58(3), 263-267.
Gudala, L., Shaik, M., Venkataramanan, S., & Sadhu, A. K. R. (2019). Leveraging Artificial Intelligence for Enhanced Threat Detection, Response, and Anomaly Identification in Resource-Constrained IoT Networks. Distributed Learning and Broad Applications in Scientific Research, 5, 23-54.
Gayam, S. R. (2019). AI for Supply Chain Visibility in E-Commerce: Techniques for Real-Time Tracking, Inventory Management, and Demand Forecasting. Distributed Learning and Broad Applications in Scientific Research, 5, 218-251.
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
Davenport, T. H. (2018). From analytics to artificial intelligence. Journal of Business Analytics, 1(2), 73-80.
He, A., Bae, K. K., Newman, T. R., Gaeddert, J., Kim, K., Menon, R., ... & Tranter, W. H. (2010). A survey of artificial intelligence for cognitive radios. IEEE transactions on vehicular technology, 59(4), 1578-1592.
Russomanno, D. J., Kothari, C. R., & Thomas, O. A. (2005, June). Building a Sensor Ontology: A Practical Approach Leveraging ISO and OGC Models. In IC-AI (pp. 637-643).
Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).
Jensen, R. M., Veloso, M. M., & Bryant, R. E. (2008). State-set branching: Leveraging BDDs for heuristic search. Artificial Intelligence, 172(2-3), 103-139.
Nemati, H. R., Steiger, D. M., Iyer, L. S., & Herschel, R. T. (2002). Knowledge warehouse: an architectural integration of knowledge management, decision support, artificial intelligence and data warehousing. Decision Support Systems, 33(2), 143-161.
Boda, V. V. R., & Immaneni, J. (2019). Streamlining FinTech Operations: The Power of SysOps and Smart Automation. Innovative Computer Sciences Journal, 5(1).
Nookala, G., Gade, K. R., Dulam, N., & Thumburu, S. K. R. (2019). End-to-End Encryption in Enterprise Data Systems: Trends and Implementation Challenges. Innovative Computer Sciences Journal, 5(1).
Komandla, V. Enhancing Security and Fraud Prevention in Fintech: Comprehensive Strategies for Secure Online Account Opening.
Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.
Gade, K. R. (2019). Data Migration Strategies for Large-Scale Projects in the Cloud for Fintech. Innovative Computer Sciences Journal, 5(1).
Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).
Katari, A. (2019). Real-Time Data Replication in Fintech: Technologies and Best Practices. Innovative Computer Sciences Journal, 5(1).
Katari, A. (2019). ETL for Real-Time Financial Analytics: Architectures and Challenges. Innovative Computer Sciences Journal, 5(1).
Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).
Muneer Ahmed Salamkar. Next-Generation Data Warehousing: Innovations in Cloud-Native Data Warehouses and the Rise of Serverless Architectures. Distributed Learning and Broad Applications in Scientific Research, vol. 5, Apr. 2019
Muneer Ahmed Salamkar. Real-Time Data Processing: A Deep Dive into Frameworks Like Apache Kafka and Apache Pulsar. Distributed Learning and Broad Applications in Scientific Research, vol. 5, July 2019
Naresh Dulam, and Venkataramana Gosukonda. “AI in Healthcare: Big Data and Machine Learning Applications ”. Distributed Learning and Broad Applications in Scientific Research, vol. 5, Aug. 2019
Naresh Dulam. “Real-Time Machine Learning: How Streaming Platforms Power AI Models ”. Distributed Learning and Broad Applications in Scientific Research, vol. 5, Sept. 2019
Naresh Dulam. Apache Spark: The Future Beyond MapReduce. Distributed Learning and Broad Applications in Scientific Research, vol. 1, Dec. 2015, pp. 136-5
Sarbaree Mishra. Distributed Data Warehouses - An Alternative Approach to Highly Performant Data Warehouses. Distributed Learning and Broad Applications in Scientific Research, vol. 5, May 2019
Sarbaree Mishra, et al. Improving the ETL Process through Declarative Transformation Languages. Distributed Learning and Broad Applications in Scientific Research, vol. 5, June 2019
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
License Terms
Ownership and Licensing:
Authors of research papers submitted to Distributed Learning and Broad Applications in Scientific Research retain the copyright of their work while granting the journal certain rights. Authors maintain ownership of the copyright and have granted the journal a right of first publication. Simultaneously, authors agree to license their research papers under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
License Permissions:
Under the CC BY-NC-SA 4.0 License, others are permitted to share and adapt the work, as long as proper attribution is given to the authors and acknowledgement is made of the initial publication in the journal. This license allows for the broad dissemination and utilization of research papers.
Additional Distribution Arrangements:
Authors are free to enter into separate contractual arrangements for the non-exclusive distribution of the journal's published version of the work. This may include posting the work to institutional repositories, publishing it in journals or books, or other forms of dissemination. In such cases, authors are requested to acknowledge the initial publication of the work in this journal.
Online Posting:
Authors are encouraged to share their work online, including in institutional repositories, disciplinary repositories, or on their personal websites. This permission applies both prior to and during the submission process to the journal. Online sharing enhances the visibility and accessibility of the research papers.
Responsibility and Liability:
Authors are responsible for ensuring that their research papers do not infringe upon the copyright, privacy, or other rights of any third party. Scientific Research Canada disclaims any liability or responsibility for any copyright infringement or violation of third-party rights in the research papers.
If you have any questions or concerns regarding these license terms, please contact us at editor@dlabi.org.