DevSecOp teams have a growing problem in the world of logs - there is just too much of it. Every system, application, and network under the purview of a DevSecOps team is recording a log that should be deemed important, should be collected and stored, but is most likely not. This happens for a number of reasons, but can usually be boiled down to - lack of resources or lack of knowledge.
In this article, we'll explore some of these issues and challenges and provide a foundation for the work we're trying to focus on here at Trunc.
In our experience, there are four things plaguing DevSecOp teams when it comes to logs:
There are more logs than we know what to do with these days. Every system and network is generating a log that is inevitably going to be important, and the applications on those same systems and networks are generating their own logs. It's too much for an individual, or team, alone without the use of a system / platform to help collect and store this content.
This volume brings about the challenge of noise. How do you wade through the noise to find the tidbits of information that matters? The volume also highlights the need for consolidation. If a DevSecOps team has to dive across 20 different machines to better understand an event they are set up to fail. When an incident occurs, a team needs the ability to parse one data source, not N number.
This consolidation allows you to tackle issues likes data synchronization, duplication, and allows for more efficient correlation and Log Analysis.
Volume also brings about the problem of storage and scale. Where does an organization store all this information? Most regulations and standards will have a minimum storage requirement (e.g., PCI is 1 year). Depending on what you record, this can grow exponentially. This brings with it the added challenge of cost.
Data storage of logs has a real cost associated with it, and as the storage increases so do the associated costs. Most log management platforms will charge an organization for two key metrics - ingestion (how many logs are sent) and storage (how much is stored).
Across the NOC properties we generate about 1 TB of logs a day, on a good day. That is about 30 TB's of data a month, and that is with good logging hygiene (something we'll cover later). When you do the math on this volume of data, relative to the options in the market, it's an impractical, but very necessary, requirement.
This is probably one of the most important challenges we face - making sense of all the noise. The idea of logs is a recorded history that we can we leverage later when we need it. This thinking, however, is reliant on an organization being purely reactive.
Creating insights, especially actionable insights, is the holy grail we are after. The ability to solve the challenges above, but then be able to help the DevSecOps team decipher and make decisions accordingly. Was this an Indicator of a Compromise (IoC)? If so, what were they trying to do? And were they successful? Is this something worth focusing, worrying, about?
Pulling intelligence from a log is challenging, but inevitable if we want our DevSecOp teams to be successful.
The resource challenges are not new to the world of logging. The art, science, of logging is not like some of the other security counterparts. Although, important, very few organizations have teams dedicated to understanding the world of logging. Many will invest heavily in purchasing a "logging" solution, but few will invest the time and money to understand the world of logging and how to make it a security control that is just as effective as its offensive and defensive solutions.