Deploying Elasticsearch Within Kubernetes


How to run databases in production on Kubernetes
Architect an application, respond to node failure, disk out of space, restore from snapshots, and run blue-green deployments. Demo included!

Introduction

Elasticsearch is an open-source search engine based on Apache Lucene and developed by Elastic. It focuses on features like scalability, resilience, and performance, and companies all around the world, including Mozilla, Facebook, Github, Netflix, eBay, the New York Times, and others, use it every day. Elasticsearch is one of the most popular analytics platforms for large datasets and is present almost everywhere that you find a search engine. It uses a document-oriented approach when manipulating data, and it can parse it in almost real-time while a user is performing a search. It stores data in JSON and organizes data by index and type.

If we draw analogs between the components of a traditional relational database and those of Elasticsearch, they look like this:

  • Database or Table -> Index
  • Row/Column -> Document with properties

Elasticsearch Advantages

  • It originates from Apache Lucene, which provides the most robust full-text search capabilities of any open source product.
  • It uses a document-oriented architecture to store complex real-world entities as structured JSON documents. By default, it indexes all fields, which provides tremendous performance when searching.
  • It doesn’t use a schema with its indices. Documents add new fields by including them, which gives the freedom to add, remove, or change relevant fields without the downtime associated with a traditional database schema upgrade.
  • It performs linguistic searches against documents, returning those that match the search condition. It scores the results using the TFIDF algorithm, bringing more relevant documents higher up in the list of results.
  • It allows fuzzy searching, which helps find results even with misspelled search terms.
  • It supports real-time search autocompletion, returning results while the user types their search query.
  • It uses a RESTful API, exposing its power via a simple, lightweight interface.
  • Elasticsearch executes complex queries with tremendous speed. It also caches queries, returning cached results for other requests that match a cached filter.
  • It scales horizontally, making it possible to extend resources and balance the load between cluster nodes.
  • It breaks indices into shards, and each shard has any number of replicas. Each node knows the location of every document in the cluster and routes requests internally as necessary to retrieve the data.

Terminology

Elasticsearch uses specific terms to define its components.

  • Cluster: A collection of nodes that work together.
  • Node: A single server that acts as part of the cluster, stores the data, and participates in the cluster’s indexing and search capabilities.
  • Index: A collection of documents with similar characteristics.
  • Document: The basic unit of information that can be indexed.
  • Shards: Indexes are divided into multiple pieces called shards, which allows the index to scale horizontally.
  • Replicas: Copies of index shards

Prerequisites

To perform this demo and deploy Elasticsearch on Kubernetes, you need one of the following:

  • An existing Rancher deployment and Kubernetes cluster, or
  • Two nodes in which to deploy Rancher and Kubernetes, or
  • A node in which to deploy Rancher and a Kubernetes cluster running in a hosted provider such as GKE.

This article uses the Google Cloud Platform, but you may use any other provider or infrastructure.

Launch Rancher

If you don’t already have a Rancher deployment, begin by launching one. The quick start guide covers the steps for doing so.

Launch a Cluster

Use Rancher to set up and configure your cluster according to the guide most suited to your environment.

Deploy Elasticsearch

If you are already comfortable with kubectl, you can apply the manifests directly. If you prefer to use the Rancher user interface, scroll down for those instructions.

We will deploy Elasticsearch as a StatefulSet with two Services: a headless service for communicating with the pods and another for interacting with Elasticsearch from outside of the Kubernetes cluster.

svc-cluster.yaml

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-cluster
spec:
  clusterIP: None
  selector:
    app: es-cluster
  ports:
  - name: transport
    port: 9300
$ kubectl apply -f svc-cluster.yaml
service/elasticsearch-cluster created

svc-loadbalancer.yaml

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-loadbalancer
spec:
  selector:
    app: es-cluster
  ports:
  - name: http
    port: 80
    targetPort: 9200
  type: LoadBalancer
$ kubectl apply -f svc-loadbalancer.yaml
service/elasticsearch-loadbalancer created

es-sts-deployment.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: es-config
data:
  elasticsearch.yml: |
    cluster.name: my-elastic-cluster
    network.host: "0.0.0.0"
    bootstrap.memory_lock: false
    discovery.zen.ping.unicast.hosts: elasticsearch-cluster
    discovery.zen.minimum_master_nodes: 1
    xpack.security.enabled: false
    xpack.monitoring.enabled: false
  ES_JAVA_OPTS: -Xms512m -Xmx512m
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: esnode
spec:
  serviceName: elasticsearch
  replicas: 2
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: es-cluster
    spec:
      securityContext:
        fsGroup: 1000
      initContainers:
      - name: init-sysctl
        image: busybox
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
      containers:
      - name: elasticsearch
        resources:
            requests:
                memory: 1Gi
        securityContext:
          privileged: true
          runAsUser: 1000
          capabilities:
            add:
            - IPC_LOCK
            - SYS_RESOURCE
        image: docker.elastic.co/elasticsearch/elasticsearch:6.5.0
        env:
        - name: ES_JAVA_OPTS
          valueFrom:
              configMapKeyRef:
                  name: es-config
                  key: ES_JAVA_OPTS
        readinessProbe:
          httpGet:
            scheme: HTTP
            path: /_cluster/health?local=true
            port: 9200
          initialDelaySeconds: 5
        ports:
        - containerPort: 9200
          name: es-http
        - containerPort: 9300
          name: es-transport
        volumeMounts:
        - name: es-data
          mountPath: /usr/share/elasticsearch/data
        - name: elasticsearch-config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          subPath: elasticsearch.yml
      volumes:
        - name: elasticsearch-config
          configMap:
            name: es-config
            items:
              - key: elasticsearch.yml
                path: elasticsearch.yml
  volumeClaimTemplates:
  - metadata:
      name: es-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 5Gi
$ kubectl apply -f es-sts-deployment.yaml
configmap/es-config created
statefulset.apps/esnode created

Deploy Elasticsearch via the Rancher UI

If you prefer, import each of the manifests above into your cluster via the Rancher UI. The screenshots below shows the process for each of them.

Import svc-cluster.yaml

01 02 03 04

Import svc-loadbalancer.yaml

05 06

Import es-sts-deployment.yaml

07 08 09 10

Retrieve the Load Balancer IP

You’ll need the address of the load balancer that we deployed. You can retrieve this via kubectl or the UI.

Use the CLI

$ kubectl get svc elasticsearch-loadbalancer
NAME                         TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
elasticsearch-loadbalancer   LoadBalancer   10.59.246.186   35.204.239.246   80:30604/TCP   33m

Use the UI

11

Test the Cluster

Use the address we retrieved in the previous step to query the cluster for basic information.

$ curl 35.204.239.246
{
  "name" : "d7bDQcH",
  "cluster_name" : "my-elastic-cluster",
  "cluster_uuid" : "e3JVAkPQTCWxg2vA3Xywgg",
  "version" : {
    "number" : "6.5.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "816e6f6",
    "build_date" : "2018-11-09T18:58:36.352602Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Query the cluster for information about its nodes. The asterisk in the master column highlights the current master node.

$ curl 35.204.239.246/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.56.2.8           24          97   5    0.05    0.12     0.13 mdi       -      d7bDQcH
10.56.0.6           28          96   4    0.01    0.05     0.04 mdi       *      WEOeEqC

Check the available indices:

$ curl 35.204.239.246/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

Because this is a fresh install, it doesn’t have any indices or data. To continue this tutorial, we’ll inject some sample data that we can use later. The files that we’ll use are available from the Elastic website. Download them and then load them with the following commands:

$ curl -H 'Content-Type: application/x-ndjson' -XPOST \
    'http://35.204.239.246/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST \
    'http://35.204.239.246/bank/account/_bulk?pretty' --data-binary @accounts.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST \
    'http://35.204.239.246/_bulk?pretty' --data-binary @logs.json

When we recheck the indices, we see that we have five new indices with data.

$ curl 35.204.239.246/_cat/indices?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2015.05.20 MFdWJxnsTISH0Z9Vr0aT3g   5   1       4750            0     49.9mb         25.2mb
green  open   logstash-2015.05.18 lLHV2nzvTOG9mzlpKaG9sg   5   1       4631            0     46.5mb         23.5mb
green  open   logstash-2015.05.19 PqNnVUgXTyaDSfmCQZwbLQ   5   1       4624            0     48.2mb         24.2mb
green  open   shakespeare         rwl3xBgmQtm8B3V7GFeTZQ   5   1     111396            0       46mb         23.1mb
green  open   bank                z0wVGsbrSiG2cQwRXwaCOg   5   1       1000            0    949.2kb        474.6kb

Each of these contains a different type of document. For the shakespeare index, we can search for the name of a play. For the logstash-2015.05.19 index we can query and filter data based on an IP address, and for the bank index we can search for information about a particular account.

12 13 14

Conclusion

Elasticsearch is extremely powerful. It is both simple and complex – simple to deploy and use, and complex in the way that it interacts with its data.

This article has shown you the basics of how to deploy Elasticsearch on Kubernetes with Rancher and how to query it via the RESTful API.

If you wish to explore ways to use Elasticsearch in everyday situations, we encourage you to explore the other parts of the ELK stack: Kibana, Logstash, and Beats. These tools round out an Elasticsearch deployment and make it useful for storing, retrieving, and visualizing a broad range of data from systems and applications.

How to run databases in production on Kubernetes
Architect an application, respond to node failure, disk out of space, restore from snapshots, and run blue-green deployments. Demo included!
Calin Rus
github
Calin Rus
快速开启您的Rancher之旅