关注微信公众号
第一手干货与资讯
加入官方微信群
获取免费技术支持
Elasticsearch is an open-source search engine based on Apache Lucene and developed by Elastic. It focuses on features like scalability, resilience, and performance, and companies all around the world, including Mozilla, Facebook, Github, Netflix, eBay, the New York Times, and others, use it every day. Elasticsearch is one of the most popular analytics platforms for large datasets and is present almost everywhere that you find a search engine. It uses a document-oriented approach when manipulating data, and it can parse it in almost real-time while a user is performing a search. It stores data in JSON and organizes data by index and type.
If we draw analogs between the components of a traditional relational database and those of Elasticsearch, they look like this:
Elasticsearch uses specific terms to define its components.
To perform this demo and deploy Elasticsearch on Kubernetes, you need one of the following:
This article uses the Google Cloud Platform, but you may use any other provider or infrastructure.
If you don’t already have a Rancher deployment, begin by launching one. The quick start guide covers the steps for doing so.
Use Rancher to set up and configure your cluster according to the guide most suited to your environment.
If you are already comfortable with kubectl, you can apply the manifests directly. If you prefer to use the Rancher user interface, scroll down for those instructions.
kubectl
We will deploy Elasticsearch as a StatefulSet with two Services: a headless service for communicating with the pods and another for interacting with Elasticsearch from outside of the Kubernetes cluster.
svc-cluster.yaml
apiVersion: v1 kind: Service metadata: name: elasticsearch-cluster spec: clusterIP: None selector: app: es-cluster ports: - name: transport port: 9300
$ kubectl apply -f svc-cluster.yaml service/elasticsearch-cluster created
svc-loadbalancer.yaml
apiVersion: v1 kind: Service metadata: name: elasticsearch-loadbalancer spec: selector: app: es-cluster ports: - name: http port: 80 targetPort: 9200 type: LoadBalancer
$ kubectl apply -f svc-loadbalancer.yaml service/elasticsearch-loadbalancer created
es-sts-deployment.yaml
apiVersion: v1 kind: ConfigMap metadata: name: es-config data: elasticsearch.yml: | cluster.name: my-elastic-cluster network.host: "0.0.0.0" bootstrap.memory_lock: false discovery.zen.ping.unicast.hosts: elasticsearch-cluster discovery.zen.minimum_master_nodes: 1 xpack.security.enabled: false xpack.monitoring.enabled: false ES_JAVA_OPTS: -Xms512m -Xmx512m --- apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: esnode spec: serviceName: elasticsearch replicas: 2 updateStrategy: type: RollingUpdate template: metadata: labels: app: es-cluster spec: securityContext: fsGroup: 1000 initContainers: - name: init-sysctl image: busybox imagePullPolicy: IfNotPresent securityContext: privileged: true command: ["sysctl", "-w", "vm.max_map_count=262144"] containers: - name: elasticsearch resources: requests: memory: 1Gi securityContext: privileged: true runAsUser: 1000 capabilities: add: - IPC_LOCK - SYS_RESOURCE image: docker.elastic.co/elasticsearch/elasticsearch:6.5.0 env: - name: ES_JAVA_OPTS valueFrom: configMapKeyRef: name: es-config key: ES_JAVA_OPTS readinessProbe: httpGet: scheme: HTTP path: /_cluster/health?local=true port: 9200 initialDelaySeconds: 5 ports: - containerPort: 9200 name: es-http - containerPort: 9300 name: es-transport volumeMounts: - name: es-data mountPath: /usr/share/elasticsearch/data - name: elasticsearch-config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml subPath: elasticsearch.yml volumes: - name: elasticsearch-config configMap: name: es-config items: - key: elasticsearch.yml path: elasticsearch.yml volumeClaimTemplates: - metadata: name: es-data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 5Gi
$ kubectl apply -f es-sts-deployment.yaml configmap/es-config created statefulset.apps/esnode created
If you prefer, import each of the manifests above into your cluster via the Rancher UI. The screenshots below shows the process for each of them.
You’ll need the address of the load balancer that we deployed. You can retrieve this via kubectl or the UI.
$ kubectl get svc elasticsearch-loadbalancer NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch-loadbalancer LoadBalancer 10.59.246.186 35.204.239.246 80:30604/TCP 33m
Use the address we retrieved in the previous step to query the cluster for basic information.
$ curl 35.204.239.246 { "name" : "d7bDQcH", "cluster_name" : "my-elastic-cluster", "cluster_uuid" : "e3JVAkPQTCWxg2vA3Xywgg", "version" : { "number" : "6.5.0", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "816e6f6", "build_date" : "2018-11-09T18:58:36.352602Z", "build_snapshot" : false, "lucene_version" : "7.5.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
Query the cluster for information about its nodes. The asterisk in the master column highlights the current master node.
master
$ curl 35.204.239.246/_cat/nodes?v ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 10.56.2.8 24 97 5 0.05 0.12 0.13 mdi - d7bDQcH 10.56.0.6 28 96 4 0.01 0.05 0.04 mdi * WEOeEqC
Check the available indices:
$ curl 35.204.239.246/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
Because this is a fresh install, it doesn’t have any indices or data. To continue this tutorial, we’ll inject some sample data that we can use later. The files that we’ll use are available from the Elastic website. Download them and then load them with the following commands:
$ curl -H 'Content-Type: application/x-ndjson' -XPOST \ 'http://35.204.239.246/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json $ curl -H 'Content-Type: application/x-ndjson' -XPOST \ 'http://35.204.239.246/bank/account/_bulk?pretty' --data-binary @accounts.json $ curl -H 'Content-Type: application/x-ndjson' -XPOST \ 'http://35.204.239.246/_bulk?pretty' --data-binary @logs.json
When we recheck the indices, we see that we have five new indices with data.
$ curl 35.204.239.246/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open logstash-2015.05.20 MFdWJxnsTISH0Z9Vr0aT3g 5 1 4750 0 49.9mb 25.2mb green open logstash-2015.05.18 lLHV2nzvTOG9mzlpKaG9sg 5 1 4631 0 46.5mb 23.5mb green open logstash-2015.05.19 PqNnVUgXTyaDSfmCQZwbLQ 5 1 4624 0 48.2mb 24.2mb green open shakespeare rwl3xBgmQtm8B3V7GFeTZQ 5 1 111396 0 46mb 23.1mb green open bank z0wVGsbrSiG2cQwRXwaCOg 5 1 1000 0 949.2kb 474.6kb
Each of these contains a different type of document. For the shakespeare index, we can search for the name of a play. For the logstash-2015.05.19 index we can query and filter data based on an IP address, and for the bank index we can search for information about a particular account.
shakespeare
logstash-2015.05.19
bank
Elasticsearch is extremely powerful. It is both simple and complex – simple to deploy and use, and complex in the way that it interacts with its data.
This article has shown you the basics of how to deploy Elasticsearch on Kubernetes with Rancher and how to query it via the RESTful API.
If you wish to explore ways to use Elasticsearch in everyday situations, we encourage you to explore the other parts of the ELK stack: Kibana, Logstash, and Beats. These tools round out an Elasticsearch deployment and make it useful for storing, retrieving, and visualizing a broad range of data from systems and applications.