Optimize the performance of an Elastic stack
Using an Elastic stack to monitor an application or to get business insights is a common, open-source and easy to use approach. Nevertheless, it can get slow, unstable and resource intensive if not configured correctly. To keep costs and performance in acceptable ranges, it is crucial to optimize the stack. The following blogpost will give some insights into possible optimization opportunities.
2. Elastic Stack
The Elastic Stack is a generic term for the software modules called Beats, Elasticsearch, Logstash, and Kibana . They are an open-source solution to deliver, store, transform, search and visualize data. This article focuses on the technologies Elasticsearch and Kibana. Elasticsearch is a NoSQL database to store and efficiently search content. It is the key element of the stack keeping the data and providing it efficiently. Kibana enables visualization of the retrieved data in dashboards and allows querying, filtering and displaying data directly.
3. Index and Index Pattern
The index is the data representation in Kibana. The index pattern describes how incoming data is mapped onto an index. It is one of the most central keys to modify the application. How these indices are handled can have a high impact on the performance. Two common issues and their solution are described in the following.
3.1. Scripted Fields
Scripted Fields are an easy way to calculate better fitting data by calculating new values on the go when they are read from Elasticsearch into Kibana. Unfortunately, they need to be calculated for every message within Kibana. This can get resource intensive if done too often or with complex terms. Therefore, it should be avoided with complex calculations or using many different variables. It slows down the delivery process and can cause lag when too much data arrives. Critical log statements may end up in a queue. It is better to calculate the needed metrics within the application and send it to the cluster. The index will only map the data format and will be much faster.
3.2. Separate Indices
When using time series data, it is quite helpful to use separate indices for certain periods of time. Most of the time, data in small time ranges – usually the latest ones - is searched. Having small, time-based indices speeds up the search process as only few small documents are used for the request instead of a single big one containing all the data. Small indices also enable deleting old data without much effort on a regular basis. The old indices can simply be removed. This leads to less unnecessary data in Kibana and therefore better performance and lower costs.
To define and activate automatically changing indices, the rollover API can be called or a date format can be added to the index which switches to a new index based on the date format provided. The following example shows how to activate a time-based index via REST.
In this example, the timestamp stored in the field “receptionTimestamp” will be added as date to the index called “index” which will result in this case in a new index every day (e.g. index-2021-05-01) because the naming changes based on this request.
It is also recommended to use different indices for different data types instead of trying to map all to one. This distributes the load and makes different use-cases independent of each other.
4. Data Preparation
Setting up the cluster correctly is important for any application, but it is as important to prepare the data correctly. This enables solving occurring problems early and outside the Elastic stack. It will reduce cost and storage size and improve the performance.
Indexing large amounts of data can take time. If an application is sending a lot of real-time data – probably also critical monitoring data – they are processed much faster when sent as a batch compared to single messages as they save a lot of overhead costs.
It is recommended to send only those variables from the application producing the data to the Elastic stack which are needed to analyze the data. Everything else should be filtered beforehand. This will lead to a much smaller index in Kibana and less storage space and traffic in Elasticsearch. On top, too much filter criteria or very long log statements within Kibana can cause an information overload and exacerbate finding a solution.
5. Cluster tuning
Tuning the parameters of the Elasticsearch cluster can make a huge difference in terms of performance. They are strongly dependent on the use-case and need to be configured to find the right balance between performance and costs.
5.1. Storage Space
This one is simple but important. Make sure that there is always enough storage space available and implement monitoring to get notified early when the cluster runs full. Scaling up the cluster takes a lot of time. If it is started too late the cluster cannot process any new data and is unavailable for hours.
5.2. JVM Memory Pressure
Elasticsearch is an application written in Java . Therefore, it uses the Java Virtual Machine (JVM). It can get a memory shortage and run out of memory in the worst case if the cluster is used too heavily in terms of requests and message size compared to its cluster size. If there is not enough space available, the garbage collector, which removes data which is not needed anymore, cannot work correctly. This leads to an increasing memory pressure which will reach 100% in the worst case. Then, the entire cluster stops processing to avoid an OutOfMemoryError and an intensive garbage collection takes place. This will take time during which the application is not available anymore. To avoid running into such problems, enough memory should be allocated to the cluster. The memory pressure should be far away from its limit. This can also be reached by a leaner index pattern with few scripted fields and by avoiding complex queries over long time ranges.
5.3. Hot-Warm-Cold Architecture
Application data quickly sums up to huge amounts of storage space. It needs to be stored on SSDs for fast retrieval. This can get quite costly when the servers are not already there but need to be bought or booked in a cloud environment. Most recent data is crucial most of the time but loses rapidly in value. After a few days, it is very unlikely that the data is accessed frequently again. To avoid high bills, the data can be separated into different availability classes and stored on different server types. Usually it is differentiated into hot, warm, and cold nodes. Hot nodes are highly available and most frequently accessed. They have a lot of CPU and Memory available and use a small but fast storage. Warm nodes are less frequently accessed and therefore need less CPU and Memory but should still use a fast SSD storage. Cold nodes are rarely accessed and need few CPU and Memory and a slower but large HDD storage. This old data can be stored on an HDD drive for example to save hardware costs. Usually, there are many hot nodes needed for fast performance and few cold nodes for older data as this is accessed infrequently. Elasticsearch can do a rollover to automatically move the data to the different hardware after a certain period of time and save a lot of money. The following image illustrates this concept. It displays the relation between the amount of nodes (size of the rectangles) and the amount of data stored.
This blogpost demonstrated how to get better results out of an Elastic stack by preparing the data, improving the indices and by tuning the underlying infrastructure based on the needs of the application. It showed common pattern used for optimization. This increases the stack’s performance, avoids problems with the application, improves user satisfaction, and saves money.