Fully Automating Chaos Engineering with Large Language Models


View a PDF of the paper titled ChaosEater: Fully Automating Chaos Engineering with Large Language Models, by Daisuke Kikuta and 2 other authors

View PDF

Abstract:Chaos Engineering (CE) is an engineering technique aimed at improving the resiliency of distributed systems. It involves artificially injecting specific failures into a distributed system and observing its behavior in response. Based on the observation, the system can be proactively improved to handle those failures. Recent CE tools implement the automated execution of predefined CE experiments. However, defining these experiments and improving the system based on the experimental results still remain manual. To reduce the costs of the manual operations, we propose ChaosEater, a system for automating the entire CE operations with Large Language Models (LLMs). It predefines the agentic workflow according to a systematic CE cycle and assigns subdivided operations within the workflow to LLMs. ChaosEater targets CE for Kubernetes systems, which are managed through code (i.e., Infrastructure as Code). Therefore, the LLMs in ChaosEater perform software engineering tasks to complete CE cycles, including requirement definition, code generation, debugging, and testing. We evaluate ChaosEater through case studies on both small and large Kubernetes systems. The results demonstrate that it stably completes reasonable single CE cycles with significantly low time and monetary costs. The CE cycles are also qualitatively validated by human engineers and LLMs.

Submission history

From: Daisuke Kikuta [view email]
[v1]
Sun, 19 Jan 2025 16:35:09 UTC (3,005 KB)
[v2]
Wed, 16 Apr 2025 03:33:29 UTC (3,360 KB)



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Review Your Cart
0
Add Coupon Code
Subtotal

 
Scroll to Top