ElasticSearch Interview Questions & Answers

Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.

Elasticsearch is developed alongside a data-collection and log-parsing engine called Logstash, and analytics and visualization platform called Kibana. The three products are designed for use as an integrated solution, referred to as the “Elastic Stack” (formerly the “ELK stack”).

Shay Banon created the precursor to Elasticsearch, called Compass, in 2004. Since the first version of Elasticsearch was released in 2010, it has quickly become the most popular search engine, and is commonly used for log analytics, full-text search, and operational intelligence use cases.

What is Elastic Search?

Elastic search is an open source, broadly-distributable, readily-scalable, enterprise-grade search engine based on Lucene and released under the terms of the Apache License. ElasticSearch is built on Apache Lucene, which is an open source library for high-performance, full-featured text search. As Apache Lucene is a library, we need to do a lot of coding to integrate it with existing applications. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. It can search and index document files in diverse formats. It was designed to be used in distributed environments by providing flexibility and scalability. Now, elastic search is the most popular useful enterprise search engine followed by Apache Solr, also based on Lucene.

What are the features of Elasticsearch?

Elasticsearch is open source

It is very fast.
Good defaults for complex Lucene classes
Reindex API

Real-time Application Monitoring
It uses denormalization to improve the search performance.
It can be used as a replacement of document stores like MongoDB and RavenDB.

Elasticsearch is schema-free and document-oriented. For many business applications, these are important technical innovations compared to legacy enterprise search engines.
Elasticsearch-hadoop uses Elasticsearch REST interface for communication, allowing for flexible deployments by minimizing the number of ports needed to be open within a network.
Elasticsearch works with a wide range of data connectors that are readily available or custom-built, enabling you to search across multiple repositories efficiently.

What is the query language of Elasticsearch?
ElasticSearch uses the Apache Lucene query language, which is called Query DSL.

What are the Basic Concepts of Elasticsearch?

The basic concepts of Elasticsearch: node, clusters, near real-time search, indexes, shards, mapping types, document, RESTful API, and more.

Node: It is a single server that holds some data and participates on the cluster’s indexing and querying. A node can be configured to join a specific cluster by the particular cluster name. A single cluster can have as many nodes as we want. A node is simply one Elasticsearch instance. Consider this a running instance of MySQL. There is one MySQL instance running per machine on different a port, while in Elasticsearch, generally, one Elasticsearch instance runs per machine. Elasticsearch uses distributed computing, so having separate machines would help, as there would be more hardware resources.

Cluster: It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data. For relational databases, the node is DB Instance. There can be N nodes with the same cluster name.

Near-Real-Time (NRT): ES is an NRT search platform. There is a slight from the time you index a document until the time it becomes searchable.

Index: The index is a collection of documents that have similar characteristics. For example, we can have an index for customer data and another one for a product information. An index is identified by a unique name that refers to the index when performing indexing search, update, and delete operations. In a single cluster, we can define as many indexes as we want.

Shard: A shard is a subset of documents of an index. An index can be divided into many shards. Indexes are horizontally subdivided into shards. It means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.

Mapping Type: Mapping type = database table in an RDBMS. This is a collection of documents sharing a set of common fields present in the same index. For example, an Index contains data of a social networking application, and then there can be a specific type for user profile data, another type for messaging data and another for comments data.

Document: This is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier, called the UID.

RESTful API: Elasticsearch is driven by RESTful API. Almost every action can be performed with RESTful API by using JSON through HTTP.

What is a Replica in Elasticsearch?

What are the core field’s type in Elasticsearch?

What are the predefined fields in Elasticsearch?

What is inverted index in Elasticsearch?

Inverted index is the heart of search engines. The primary goal of a search engine is to provide speedy searches while finding the documents in which our search terms occur. Inverted index is a hash map like data structure that directs users from a word to a document or a web page. It is the heart of search engines. Its main goal is to provide quick searches for finding data from millions of documents. Usually in Books we have inverted indexes as below. Based on the word we can thus find the page on which the word exists.

What are the basic operations you can perform on a document?

How to identify a document uniquely in Elasticsearch?

What is horizontal scaling in Elasticsearch?

What is vertical scaling in Elasticsearch?

Can you explain Analyzers in Elasticsearch?

Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration:

Standard Analyzer: It divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.

Simple Analyzer: It divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.

Whitespace Analyzer: It divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.

Stop Analyzer: It is like the simple analyzer, but also supports removal of stop words.

Keyword Analyzer: It is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.

Pattern Analyzer: It uses a regular expression to split the text into terms. It supports lower-casing and stop words.

Language Analyzers: provides many language-specific analyzers like english or french.

Fingerprint Analyzer: It is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

Custom analyzers: If you do not find an analyzer suitable for your needs, you can create a custom analyzer which combines the appropriate character filters, tokenizer, and token filters.

What is from component in search request?

What are the some built token filter in Elasticsearch?

What is character filtering?

What is the difference between match query and term query in Elasticsearch?

What is use of attributes- enabled, index and store?

The enabled attribute applies to various ElasticSearch specific/created fields such as _index and _size. User-supplied fields do not have an “enabled” attribute.

Store means the data is stored by Lucene will return this data if asked. Stored fields are not necessarily searchable. By default, fields are not stored, but full source is. Since you want the defaults (which makes sense), simply do not set the store attribute.

The index attribute is used for searching. Only indexed fields can be searched. The reason for the differentiation is that indexed fields are transformed during analysis, so you cannot retrieve the original data if it is required.

What is a filter in Elasticsearch?

After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.

And filter
Bool filter

Exists filter
Geo bounding box filter
Geo distance filter

Geo distance range filter
Geo polygon filter
Geoshape filter

geohash cell filter
Has child filter
Has parent filter

Ids filter
Indices filter
Limit filter

Match all filter
Missing filter
Nested filter

Not filter
Or filter
Prefix filter

Query filter
Range filter
Regexp filter

Script filter
Term filter

What are the pros and cons of Elasticsearch?

Pros:

Lucene is an open-source search engine library .Elastic search is built on top of Lucene, which is a full-featured information retrieval library, so it provides the most powerful full-text search capabilities of any open source product.

Elastic Search implements a lot of features, such as customized splitting text into words, customized stemming, facetted search, etc.
It is API driven, actions can be performed using a simple Restful API. Application doesn’t need to be written in Java to work with Elasticsearch. It has a powerful JSON-based DSL it allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster.
Scalability is simple. Since it is schema-less it accepts all type of data.

Elastic search is able to execute complex queries extremely fast, efficiency in setting up complex bespoke search functionality.
Elasticsearch records any changes made in transactions logs on multiple nodes in the cluster to minimize the chance of data loss.
The simplicity of managing Elasticsearch is a big plus. We’re able to integrate routine processes such as building indices straight into our automated deployment process quickly and easily.

It creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.

Cons:

Elasticsearch does not have any built-in authentication or authorization system.

Elasticsearch is not an ACID compliant system.
One can’t write Elasticsearch queries in SQL.
ES is not a relational database and hence if your data would benefit from things like foreign-key constraints etc. Elasticsearch is not a good choice as your primary data store.

The distributed nature of Elastic search can have negative effects on data consistency.