We ❤️ Open Source
A community education resource
How LLMs and AI agents eliminate DevOps toil with intelligent automation
From invisible cognitive load to AI that meets engineers at their level.
DevOps professionals are invisible until something breaks. In this presentation at All Things Open, Kedar Kulkarni, Senior DevOps Architect at Apple, shares why the real pain in DevOps isn’t just busy work, it’s cognitive load from parsing inconsistent logs, recalling outdated runbooks during incidents, and hunting for tribal knowledge trapped in Slack threads.
Kedar reframes toil beyond just manual tasks. It’s context switching between tools with different formats, translating YAML to JSON constantly, and finding that one expert getting calls from 10 different projects. This invisible toil varies wildly based on experience. A network admin might fix DNS in 5 minutes while someone else spends two hours under middle-of-the-night page pressure. The pain is real, measurable, and costs organizations productivity they don’t track.
LLMs and AI agents address this differently. LLMs answer questions because they’ve read everything. AI agents actually do things. Kedar demonstrates with kubectl AI, showing how a junior engineer gets quick troubleshooting steps for a broken pod, while a senior engineer asks for pattern analysis across deployment revisions. A security person checks for malicious activity, and a business person estimates time to fix for customers. Same tool, different perspectives, all getting value based on their role and needs.
Read more: 5 forces driving DevOps and AI in 2026
His second demo takes this further with self-healing systems. Using kubectl AI with K8sGPT operator, he shows how multi-agent workflows can detect problems, analyze root causes, and automatically apply fixes without human intervention. A pod goes from crash loop to running on its own. But Kedar emphasizes keeping humans in the loop because most DevOps knowledge lives in private Slack threads and war rooms, not GitHub repos where AI trains.
AI adoption looks different for every team. Startups need speed and rapid iteration. Mid-size companies need automation and process. Enterprises need coordination and enforced standards. Kedar’s roadmap starts simple: Week one, identify top three toil sources. Then in the first month, focus on automating common troubleshooting. For long-term success, get 80 percent of the team knowing which AI tools are approved and how to use them.
Key takeaways
- LLMs help troubleshoot, monitor, and analyze security concerns by meeting engineers at their experience level and providing context-aware guidance.
- AI-powered workflows enable self-healing systems through multi-agent collaboration, but human oversight remains critical for production decisions.
- Adoption strategies differ by company size, from startup speed to enterprise coordination, but all benefit from starting small and measuring impact.
AI amplifies human capabilities in DevOps, it doesn’t replace judgment. The goal isn’t eliminating humans, it’s eliminating toil so teams can focus on higher-value work while maintaining the oversight that only experience provides.
👉 Watch the DevOps track playlist from All Things Open 2025.
More from We Love Open Source
- Want to get into AI? Start with this.
- 5 forces driving DevOps and AI in 2026
- Deep dive into the Model Context Protocol
- The secret skill every developer needs to succeed with AI today
- Open source won, so why are we still fighting?
The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.