# ElasticSearch¶

To help make our metadata on files and applications searchable and to keep our logging information we use Elasticsearch. This is done by having two separate Elasticsearch clusters: One for logs and another for metadata.

## Shared Configuration and General Setup¶

Both clusters consists of 3+ data nodes, 3 master nodes and 2+ client nodes.

Elasticsearch tries to distribute data evenly between its data nodes, however it supports that if a node crosses certain thresholds called watermarks it will take action to try to redistribute, if available, or to stop nodes from creating more shards or even make them read only until the problem has been resolved. These limitation are divided into 3 different levels and are shared by both clusters:

• Low: No more shards for replicas can be allocated to this node.

• High: Start to automatically relocate shards away from node to generate more space.

• Flood stage: Each index that has shards on this node will now be put into read_only_allow_delete mode. This mode can only be removed manually by the cluster admin after fixing the storage problem.

We mainly use our metadata on our files an applications to enable quick and efficient searching.

### Files¶

UCloud periodically scans the file system to create an index of files. This index is stored in ElasticSearch and contains information about the file. The files index consists of 5 primary shards and 2 replicas. This makes it possible to handle large amount of updates and searches. Should there be any need for it, an additional files index can be created and work alongside the original files index without any notable efficiency decrease.

### Applications¶

When an application is created name, version, title and tags are also created in the application index of our metadata Elasticsearch cluster. This information is used to enable advanced search of applications in an efficient manner. Just as with files, any update to an application would also be seen in the index.

## Logging/Audit¶

All logging/Auditing resides in a single Elasticsearch cluster that only contains logs. We are receiving logs through “three” different channels (see figure below).

1. All audit logs created when users are using the cloud service

2. All system logs from each of the nodes connected to the cluster

3. All stderr/stdout from each pod running in the kubernetes cluster

### Cloud Service Auditing Ingestion¶

The audit logs are created by the users preforming actions on the cloud service and thereby creating events. These events are stored in Redis. The Audit Ingestion Service then collects these events and places in their respective index.

Each node in the cluster have a Filebeat instance and a Logstash instance installed. The Filebeat checks the predetermined log files on the nodes. If the log file changes, this change is collected by Filebeat and forwarded to Logstash. Logstash then formats the input according to the elasticsearch mapping required and then forwards this to the Elasticsearch cluster.

To get the logs from our pods on Kubernetes, we use Ranchers logging tool. The tool sets up a Fluentd instance on each node in the cluster and collects logs from /var/log/containers. These logs are then forwarded to the Elasticsearch cluster.

### Data Retention and Automated Curation¶

By using our own Elasticsearch Management Tool are we able to keep our shard and indices consumption at a minimum. A daily scan is run to reduce the number of shards in the audit logs for the previous day to 1 primary shard and 1 replica shard per index. This would usually mean that indexing and search speed would decrease, however since this is the logs for the previous day then the need of indexing is no more and search does not have to fast. On top of this shrinking process we also do a reindex to a monthly index of week old indices. By doing this we reduce the former weeks daily indices into a single monthly index continually.

During the daily scan we also check old logs to see if any of them have expired according the predetermined retention period. If the entire index only contains expired logs, then the entire index is delete, otherwise it would only be the specific logs that are deleted. This gives us the ability to have a fine granularity if some logs needs to be kept longer than other logs.

Last but not least we do an aliasing of the 7 last days of logs. All these indices will now be included in search when we search the “Grafana” index. This makes search and response time of Kibana much faster since it does not look in the last 180 days of logs but instead only the last 7 days.