Preventing Misconfiguration in Logstash with empow’s Pipeline Viewer

Posted by Dr. Rami Cohen, VP R&D on Oct 1, 2019 4:43:18 AM
Dr. Rami Cohen, VP R&D
Find me on:

Elastic’s Logstash multi-pipeline feature, introduced in Version 6.7, is a cool feature, that enables us to easily and flexibly write, maintain, and manage our SIEM parsers. Nevertheless, the fact that it requires manual configuration may lead to serious misconfiguration problems which may be difficult to find. Using the pipeline viewer, a simple open source tool, you can view and fix errors in your multi- pipeline structure, including inputs, outputs, and connectivity between pipelines, detecting broken pipeline connectivity and cycles. 

Here at empow we use Logstash together with Elastic Search as the backend data-lake for our SIEM platform to collect, normalize and enrich logs received from security products.

While Logstash is quite flexible and enables us to easily write new parsers for any new products in hours, the fact that it relies on a single pipeline raised some configuration concerns and requires some logic and attention to ensure that each log is processed by the correct parser.

Thus, when Elastic released its Logstash multi-pipeline and pipeline-to-pipeline technologies we were happy to adopt these features to mitigate these configuration concerns.

 

THE ADVANTAGES OF MULTI-PIPELINE AND PIPELINE-TO-PIPELINE CONFIGUATIONS

Using the multi-pipeline feature, each product has its own independent parser consisting of an input, parser logic (filter section in Logstash) and output.

logstash1 (002)

 

The main shortcoming in this approach is the fact that each pipeline should have its own input logic and output, and sharing similar logic is impossible, requiring us to copy it to each parser. For example, consider the three pipelines described in Figure 1 in which each parser consists of a unique log normalization that a is based on the log format and structure, followed by a set of enrichments such as geographic location, empow classification, identity, etc. Since the same enrichment is used by more than one parser, the same logic should be copied multiple times, and each change should propagate to each copy.

Using pipeline-to-pipeline we can overcome these shortcomings and create more modular  flows where logs are sent from one pipeline to another. Using this technology, each pipeline is responsible for a specific part of the flow, e.g. specific enrichment or normalization, where pipelines can be added and removed easily from each flow. Moreover, an input can be shared between multiple pipelines, and properties for each pipeline (such as number of workers, or type of queue) can be configured independently.

logstash2

 

THE MISCONFIGURATION CHALLENGE – MAKING MISTAKES IS EASY

Multi-pipeline configuration is done manually by adding each pipeline to pipeline.yml configuration file and connecting pipeline to pipeline by adding labels in both the input and output sections of the corresponding pipelines. Such manual configuration lends itself  to misconfigurations that are hard to detect, including broken pipelines.  Two examples of misconfigurations are pipelines which are not connected correctly (e.g. due to a missing label or misspelling of a pipeline label), and cycles, which may cause logs to not leave the parsers but rather keep being processed again and again (infinite loop).

Consider for example a simple multi-pipeline configuration consisting of two pipeline - p1, and p2 - where the first pipeline listens to udp port 1234, add a tag ("I'm p1") and sends the log to pipeline p2 that adds another tag ("I'm p2") and sends the log to stdout.

logstash3

Now, let's assume that in the input section of pipeline p2, instead of labeling the pipeline by p1_to_p2, we mistakenly wrote p1_p2.  Look familiar?  It happens to me all the time, changing a label name and forgetting to change it in all the places it appears, misspelling, typos, and so on…

In such cases, logs entered to p1 will not find their way to p2 and will stay in the pipeline forever, constantly trying to find their way to the next pipeline. Worst of all, we can detect such a problem only when a log is entered to p1.

 

AN EASY SOLUTION FOR PREVENTING MISCONFIGURATION IN LOGSTASH

Faced with this challenge, we wrote a simple opensource tool that reads your multi-pipeline configuration, views its structure and detects and views misconfiguration. This free tool  can be downloaded from our SIEM opensource git repository.

 

Using our pipeline viewer is simple. Just give it the full path of your Logstash pipeline.yml configuration file and it will do the rest. It will read and analyze all the pipelines, detect inputs and outputs, and connectivity between pipelines and present the connectivity graph. If a configuration error is detected the tool will present it, helping you to easily fix it.

 

Let's start with a simple example, considering pipelines p1 and p2 presented above.

We add these pipelines to our pipeline.yml configuration file as follows:

pipeline.yml

logstash4

 

Now, let's use the pipeline viewer as follows:

logstash5

The output will be the following connectivity graph:

logstash6

Ellipse nodes represents inputs and outputs, rectangle nodes represent pipelines and edges between pipelines represents pipeline-to-pipeline connectivity. In this particular example, logs received on udp port number 1234, are processed by pipeline p1 and sent to pipeline p2, that process them and send them to the stdout.

 

Now, let's consider the misconfiguration described above, in which instead p1_to_p2, we mistyped p1_p2. Using the same command, we will receive the following connectivity graph:

logstash7

The main difference is that pipeline2 p1 and p2 are not connected anymore. Moreover, since the node represented p2 is in red indicate that there is a major misconfiguration. In our case, logs processed by p1 will not be able to find their way out since the output is not connected to any pipeline. A minor misconfiguration (corresponded to the previous one), is that while pipeline p2 is defined, it will not be able to receive any log since its input is not connected to any pipeline, and has no other input, so we present it by a dashed orange rectangle.

 

Moving to more complex example, let's consider a set of pipelines consisting of an input pipeline that receives all the logs and dispatches them to per product parser. The parsers send the logs to enrichment pipelines that, in this example, include intent management and intent classification (not all the logs are processed by all the enrichment). After the enrichment the log are sent to Elastic output that just sent the normalized and enriched log to the Elastic database. Sound complex? Using the pipeline viewer, we can easily view the connectivity structure.

 

logstash8

In conclusion, using the pipeline viewer – from the SIEM opensource git repository – Elastic users can take advantage of all the benefits of the multi-pipeline and pipeline-to-pipeline features, while avoiding the pitfalls of misconfigurations and errors.

For any questions or comments – drop me a line to ramic@empmow.co

 

Topics: configuration, elastic, pipeline, logstash, misconfiguration, opensource