ElasticSearch Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Setting up for Linux systems

If you are using a Linux system, you need to manage extra setup steps to improve performance or to resolve production problems with many indices.

This recipe covers two common errors that occur in production:

  • Too many open files, which can corrupt your indices and data
  • Slow performance when searching and indexing due to the garbage collector

    Note

    Other possible troubles arise when you run out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data loss, a best practice is to monitor the storage space available.

Getting ready

You need a working ElasticSearch installation.

How to do it...

In order to improve performance on Linux systems, perform the following steps:

  1. First, you need to change the current limit of the users that runs the ElasticSearch server. In our examples, we will call it elasticsearch.
  2. To allow ElasticSearch to manage a large number of files, you need to increment the number of file descriptors (the number of files) that a user can manage. To do this, you must edit your /etc/security/limits.conf file and add these lines at the end, then a machine restart is required to ensure that changes are incorporated:
    elasticsearch       -       nofile          299999
    elasticsearch       -       memlock         unlimited
  3. In order to control memory swapping, you need to set up this parameter in elasticsearch.yml:
    bootstrap.mlockall: true
  4. To fix the memory usage size of the ElasticSearch server, you need to set up the ES_MIN_MEM and ES_MAX_MEM parameters to the same values as in $ES_HOME/bin/ elasticsearch.in.sh file. Otherwise, you can set up ES_HEAP_SIZE which automatically initializes the min and max values to the same.

How it works...

The standard limit of file descriptors (the maximum number of open files for a user) is typically 1,024. When you store a lot of records in several indices, you run out of file descriptors very quickly, so your ElasticSearch server becomes unresponsive and your indices might become corrupted, leading to a loss of data. If you change the limit to a very high number, your ElasticSearch server doesn't hit the maximum number of open files.

The other settings for memory restriction in ElasticSearch prevent memory swapping and give a performance boost in a production environment. This is required because during indexing and searching ElasticSearch creates and destroys a lot of objects in the memory. This large number of create/destroy actions fragments the memory, reducing performance: the memory becomes full of holes, and when the system needs to allocate more memory, it suffers an overhead to find compacted memory. If you don't set bootstrap.mlockall: true, then ElasticSearch dumps the memory onto a disk and defragments it back in the memory, which freezes the system. With this setting, the defragmentation step is done in the memory itself, providing a huge performance boost.