At Comcast, we depend heavily on logs to triage and debug production issues that affect our customers. An unfortunate side effect of this behavior is that teams tend to generate metrics from these logs for alerting and reporting. As the technology surrounding metrics and tracing becomes more mature, internal teams are looking to quickly pivot by instrumenting their applications to take advantage of these tools and stop the logging deluge.
Our solution to this was to build an observability team and a toolset to aid developers in this transition using purely open source technology.
Our team has a few main goals:
- Provide a set of “paved paths” that developers can follow so each team is not inventing their own solution
- Automate as much as possible the creation and lifecycle of a suggested set of monitoring, logging, and tracing tools
- Support the tools so they can concentrate on building and maintain their own applications
- Prevent each team from needing to build these solutions themselves
Along the way, we’ve had to overcome a few challenges:
- Hundreds of accounts across multiple cloud providers
- Adoption when it’s “easier” to stand up a solution to suit an individual team’s preferences
- Convincing teams it’s worth the investment to reduce logs and instrument their application with metrics
- Establishing trust in an environment where teams have a strong sense of ownership to deliver for their customers