Setting up for Linux systems
If you are using a Linux system, you need to manage extra setup steps to improve performance or to resolve production problems with many indices.
This recipe covers two common errors that occur in production:
- Too many open files, which can corrupt your indices and data
- Slow performance when searching and indexing due to the garbage collector
Getting ready
You need a working ElasticSearch installation.
How to do it...
In order to improve performance on Linux systems, perform the following steps:
- First, you need to change the current limit of the users that runs the ElasticSearch server. In our examples, we will call it elasticsearch.
- To allow ElasticSearch to manage a large number of files, you need to increment the number of file descriptors (the number of files) that a user can manage. To do this, you must edit your
/etc/security/limits.conf
file and add these lines at the end, then a machine restart is required to ensure that changes are incorporated:elasticsearch - nofile 299999 elasticsearch - memlock unlimited
- In order to control memory swapping, you need to set up this parameter in
elasticsearch.yml
:bootstrap.mlockall: true
- To fix the memory usage size of the ElasticSearch server, you need to set up the
ES_MIN_MEM
andES_MAX_MEM
parameters to the same values as in$ES_HOME/bin/ elasticsearch.in.sh
file. Otherwise, you can set upES_HEAP_SIZE
which automatically initializes the min and max values to the same.
How it works...
The standard limit of file descriptors (the maximum number of open files for a user) is typically 1,024. When you store a lot of records in several indices, you run out of file descriptors very quickly, so your ElasticSearch server becomes unresponsive and your indices might become corrupted, leading to a loss of data. If you change the limit to a very high number, your ElasticSearch server doesn't hit the maximum number of open files.
The other settings for memory restriction in ElasticSearch prevent memory swapping and give a performance boost in a production environment. This is required because during indexing and searching ElasticSearch creates and destroys a lot of objects in the memory. This large number of create/destroy actions fragments the memory, reducing performance: the memory becomes full of holes, and when the system needs to allocate more memory, it suffers an overhead to find compacted memory. If you don't set bootstrap.mlockall: true
, then ElasticSearch dumps the memory onto a disk and defragments it back in the memory, which freezes the system. With this setting, the defragmentation step is done in the memory itself, providing a huge performance boost.